text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-17 02:45:24 +00:00

History

Daniël de Kok 84ab88d843 Support flashinfer for Gemma3 prefill (#3167 ) * launcher: ensure correct detection of Gemma 3 head size * Support flashinfer for Gemma3 prefill Gemma3 uses bidirectional attention for images. Flashinfer supports custom masks. Hook up the mask with flashinfer, so that we do not have to use the slower SDPA implementation for prefills with images. * Update Gemma3 test outputs * Fixed unused import		2025-04-17 18:07:41 +02:00
..
test_bloom_560m	All integration tests back everywhere (too many failed CI). (#2428 )	2024-08-16 21:19:46 +02:00
test_bloom_560m_sharded	fix: adjust test snapshots and small refactors (#2323 )	2024-07-29 11:38:38 -04:00
test_chat_llama	Lots of improvements (Still 2 allocators) (#2449 )	2024-08-29 16:29:01 +02:00
test_completion_prompts	Pr 3003 ci branch (#3007 )	2025-03-10 17:56:19 +01:00
test_compressed_tensors_w8a8_int	Basic flashinfer 0.2 support (#2862 )	2025-01-09 16:25:00 +01:00
test_compressed_tensors_w8a8_int_dynamic_weight	Improve qwen vl impl (#2943 )	2025-02-04 12:44:18 -05:00
test_compressed_tensors_w8an_fp	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_compressed_tensors_wna16_int	Basic flashinfer 0.2 support (#2862 )	2025-01-09 16:25:00 +01:00
test_compressed_tensors_wna16_int_24	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_continue_final_message	Support continue final message (#2733 )	2024-11-27 19:13:30 -05:00
test_flash_awq	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_awq_sharded	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_deepseek_v2	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_falcon	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_gemma	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_gemma2	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_gemma3	Support flashinfer for Gemma3 prefill (#3167 )	2025-04-17 18:07:41 +02:00
test_flash_gemma_gptq	Basic flashinfer 0.2 support (#2862 )	2025-01-09 16:25:00 +01:00
test_flash_gpt2	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_grammar_llama	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_llama	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_llama_exl2	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_llama_fp8	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_llama_fp8_kv_cache	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_llama_gptq	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_llama_marlin	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_llama_marlin_24	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_llama_prefix	Fix truffle (#2514 )	2024-09-11 22:45:19 +02:00
test_flash_llama_prefix_flashdecoding	Adding a test for FD. (#2516 )	2024-09-16 17:00:54 +02:00
test_flash_medusa	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_mistral	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_mixtral	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_mixtral_awq	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_mixtral_gptq	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_neox	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_neox_sharded	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_pali_gemma	All integration tests back everywhere (too many failed CI). (#2428 )	2024-08-16 21:19:46 +02:00
test_flash_pali_gemma2	Enable paligemma2 (#2807 )	2024-12-06 14:41:49 -05:00
test_flash_phi	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_phi35_moe	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_qwen2	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_qwen2_5_vl	feat: add initial qwen2.5-vl model and test (#2971 )	2025-02-19 12:38:20 +01:00
test_flash_qwen2_vl	Improve qwen vl impl (#2943 )	2025-02-04 12:44:18 -05:00
test_flash_santacoder	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_starcoder	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_flash_starcoder2	Basic flashinfer 0.2 support (#2862 )	2025-01-09 16:25:00 +01:00
test_flash_starcoder2_lora	feat: improve star coder to support multi lora layers (#2883 )	2025-01-16 16:23:55 -05:00
test_flash_starcoder_gptq	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_grammar_llama	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_grammar_response_format_llama	Move JSON grammar -> regex grammar conversion to the router (#2772 )	2024-11-25 18:47:34 +01:00
test_idefics	Support different image sizes in prefill in VLMs (#2065 )	2024-06-17 10:49:41 +02:00
test_idefics2	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_idefics3	Improve vlm support (add idefics3 support) (#2437 )	2025-01-09 10:35:32 -05:00
test_llava_next	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_lora_mistral	feat: simple mistral lora integration tests (#2180 )	2024-07-15 09:16:15 -04:00
test_mamba	All integration tests back everywhere (too many failed CI). (#2428 )	2024-08-16 21:19:46 +02:00
test_mllama	Update the flaky mllama test. (#3015 )	2025-02-12 12:26:52 +01:00
test_mpt	feat(server): Add Non flash MPT. (#514 )	2023-07-03 13:01:46 +02:00
test_mt0_base	Fixing mt0 test. (#2692 )	2024-10-25 09:46:39 +02:00
test_neox	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_neox_sharded	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_server_gptq_quantized	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00
test_smolvlm	Improve vlm support (add idefics3 support) (#2437 )	2025-01-09 10:35:32 -05:00
test_t5_sharded	feat(server): support fp16 for t5 (#360 )	2023-05-23 18:16:48 +02:00
test_tools_llama	Fix tool call4 (#3094 )	2025-03-12 09:28:47 +01:00
test_transformers_llama4	Add llama4 (#3145 )	2025-04-06 10:20:22 +02:00
test_transformers_olmo	Making sure Olmo (transformers backend) works. (#3074 )	2025-03-05 17:46:47 +01:00
test.py	Auto max prefill (#2797 )	2024-12-06 05:52:00 +01:00