Hugo Larcher 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							73b7cf83f6 
							
						 
					 
					
						
						
							
							Add backend name to telemetry ( #2962 )  
						
						... 
						
						
						
						* feat: Add backend name to telemetry 
						
					 
					
						2025-01-28 16:53:16 +01:00 
						 
				 
			
				
					
						
							
							
								drbh 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8f6146f11a 
							
						 
					 
					
						
						
							
							Revert "feat: improve qwen2-vl startup " ( #2924 )  
						
						... 
						
						
						
						Revert "feat: improve qwen2-vl startup  (#2802 )"
This reverts commit eecca27113 
						
					 
					
						2025-01-17 12:09:05 -05:00 
						 
				 
			
				
					
						
							
							
								drbh 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							eecca27113 
							
						 
					 
					
						
						
							
							feat: improve qwen2-vl startup  ( #2802 )  
						
						... 
						
						
						
						* feat: tokenize each request individually and increase warmup image size
* feat: adjust rotary embed and avoid cuda graphs of size 2 and smaller
* fix: address image resize and rebase changes
* feat: update to run qwen2-vl tests
* fix: tweak param types 
						
					 
					
						2025-01-17 11:50:41 -05:00 
						 
				 
			
				
					
						
							
							
								Nicolas Patry 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							203cade244 
							
						 
					 
					
						
						
							
							Upgrading our rustc version. ( #2908 )  
						
						... 
						
						
						
						* Upgrading our rustc version.
* Fixing the rust tests to proper version.
* Clippy everything. 
						
					 
					
						2025-01-15 17:04:03 +01:00 
						 
				 
			
				
					
						
							
							
								Dmitry Dygalo 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							01067f8ba8 
							
						 
					 
					
						
						
							
							chore: Update jsonschema to 0.28.0 ( #2870 )  
						
						... 
						
						
						
						* chore: Update jsonschema to 0.28.0
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
* chore: Enable blocking feature for reqwest
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
---------
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev> 
						
					 
					
						2025-01-10 15:01:54 +01:00 
						 
				 
			
				
					
						
							
							
								Nicolas Patry 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							82c24f7420 
							
						 
					 
					
						
						
							
							Using both value from config as they might not be correct. ( #2817 )  
						
						... 
						
						
						
						* Using both value from config as they might not be correct.
* Fixing max_position_embeddings for falcon.
* Simple attempt to fix the healthcheck block allocation.
* Much simpler solution.
* Default value for Backend start_health 
						
					 
					
						2024-12-10 19:37:09 +01:00 
						 
				 
			
				
					
						
							
							
								OlivierDehaene 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							8c3669b287 
							
						 
					 
					
						
						
							
							feat: auto max_new_tokens ( #2803 )  
						
						... 
						
						
						
						* feat: auto max_new_tokens
* update default
* Fixing the tests.
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> 
						
					 
					
						2024-12-06 05:50:35 +01:00 
						 
				 
			
				
					
						
							
							
								OlivierDehaene 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ab7ccf5bc3 
							
						 
					 
					
						
						
							
							feat: add payload limit ( #2726 )  
						
						... 
						
						
						
						* feat: add payload limit
* update launcher 
						
					 
					
						2024-11-21 18:20:15 +00:00 
						 
				 
			
				
					
						
							
							
								Nicolas Patry 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							ed87b464b4 
							
						 
					 
					
						
						
							
							Fixing "deadlock" when python prompts for trust_remote_code by always ( #2664 )  
						
						... 
						
						
						
						specifiying a value. 
						
					 
					
						2024-10-25 06:39:21 +02:00 
						 
				 
			
				
					
						
							
							
								OlivierDehaene 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							41c2623735 
							
						 
					 
					
						
						
							
							feat: allow any supported payload on /invocations ( #2683 )  
						
						... 
						
						
						
						* feat: allow any supported payload on /invocations
* update openAPI
* update doc 
						
					 
					
						2024-10-23 11:26:01 +00:00 
						 
				 
			
				
					
						
							
							
								OlivierDehaene 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							a6a0c97ed9 
							
						 
					 
					
						
						
							
							feat: prefill chunking ( #2600 )  
						
						... 
						
						
						
						* wip
* rollback
* refactor to use prefix/postfix namming + fix all_input_ids_tensor
* maybe patching vlms?
* fix filter and concat
* wip, no filter, no concat
* current
* add prepare_for_prefill
* working
* load tested
* re-create slots
* re-create slots
* fix slot_filtering_indices
* feedback loop
* remove log
* fix benchmarker
* fix vlm and seq2seq
* rename to cache and input lengths
* fix prefill logprobs
* fix launcher
* fix logprobs?
* idk at this point
* max input length
* omfg
* remove debugging lines
* fix tests
* fix mllama
* fix cargo tests
* remove support chunking for paged
* Fixing non blocked attentions
* Fixing dtype + AMD, Ipex targets.
* lint fix.
* rename
* Fix prefix_caching variable, remove defaults in server (confusing a lot
of the times).
* Add simple resolution when user specifies ATTENTION=paged.
* Put back non default simple tests.
* Fix env name
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> 
						
					 
					
						2024-10-16 12:49:33 +02:00 
						 
				 
			
				
					
						
							
							
								Nicolas Patry 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							0ff6ff60ad 
							
						 
					 
					
						
						
							
							Hotfixing main ( #2556 )  
						
						
						
					 
					
						2024-09-24 11:51:14 +02:00 
						 
				 
			
				
					
						
							
							
								OlivierDehaene 
							
						 
					 
					
						
						
							
							
						
						
						
							
						
						
							10e6f29295 
							
						 
					 
					
						
						
							
							chore: Add old V2 backend ( #2551 )  
						
						... 
						
						
						
						* wip
* added v2 
						
					 
					
						2024-09-24 08:38:17 +02:00