text-generation-inference/integration-tests/models/__snapshots__
Daniël de Kok 84ab88d843
Support flashinfer for Gemma3 prefill (#3167)
* launcher: ensure correct detection of Gemma 3 head size

* Support flashinfer for Gemma3 prefill

Gemma3 uses bidirectional attention for images. Flashinfer
supports custom masks. Hook up the mask with flashinfer, so that we do
not have to use the slower SDPA implementation for prefills with images.

* Update Gemma3 test outputs

* Fixed unused import
2025-04-17 18:07:41 +02:00
..
test_bloom_560m All integration tests back everywhere (too many failed CI). (#2428) 2024-08-16 21:19:46 +02:00
test_bloom_560m_sharded fix: adjust test snapshots and small refactors (#2323) 2024-07-29 11:38:38 -04:00
test_chat_llama Lots of improvements (Still 2 allocators) (#2449) 2024-08-29 16:29:01 +02:00
test_completion_prompts Pr 3003 ci branch (#3007) 2025-03-10 17:56:19 +01:00
test_compressed_tensors_w8a8_int Basic flashinfer 0.2 support (#2862) 2025-01-09 16:25:00 +01:00
test_compressed_tensors_w8a8_int_dynamic_weight Improve qwen vl impl (#2943) 2025-02-04 12:44:18 -05:00
test_compressed_tensors_w8an_fp Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_compressed_tensors_wna16_int Basic flashinfer 0.2 support (#2862) 2025-01-09 16:25:00 +01:00
test_compressed_tensors_wna16_int_24 Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_continue_final_message Support continue final message (#2733) 2024-11-27 19:13:30 -05:00
test_flash_awq Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_awq_sharded Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_deepseek_v2 Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_falcon Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_gemma Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_gemma2 Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_gemma3 Support flashinfer for Gemma3 prefill (#3167) 2025-04-17 18:07:41 +02:00
test_flash_gemma_gptq Basic flashinfer 0.2 support (#2862) 2025-01-09 16:25:00 +01:00
test_flash_gpt2 Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_grammar_llama Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_llama Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_llama_exl2 Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_llama_fp8 Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_llama_fp8_kv_cache Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_llama_gptq Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_llama_marlin Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_llama_marlin_24 Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_llama_prefix Fix truffle (#2514) 2024-09-11 22:45:19 +02:00
test_flash_llama_prefix_flashdecoding Adding a test for FD. (#2516) 2024-09-16 17:00:54 +02:00
test_flash_medusa Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_mistral Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_mixtral Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_mixtral_awq Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_mixtral_gptq Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_neox Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_neox_sharded Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_pali_gemma All integration tests back everywhere (too many failed CI). (#2428) 2024-08-16 21:19:46 +02:00
test_flash_pali_gemma2 Enable paligemma2 (#2807) 2024-12-06 14:41:49 -05:00
test_flash_phi Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_phi35_moe Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_qwen2 Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_qwen2_5_vl feat: add initial qwen2.5-vl model and test (#2971) 2025-02-19 12:38:20 +01:00
test_flash_qwen2_vl Improve qwen vl impl (#2943) 2025-02-04 12:44:18 -05:00
test_flash_santacoder Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_starcoder Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_flash_starcoder2 Basic flashinfer 0.2 support (#2862) 2025-01-09 16:25:00 +01:00
test_flash_starcoder2_lora feat: improve star coder to support multi lora layers (#2883) 2025-01-16 16:23:55 -05:00
test_flash_starcoder_gptq Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_grammar_llama Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_grammar_response_format_llama Move JSON grammar -> regex grammar conversion to the router (#2772) 2024-11-25 18:47:34 +01:00
test_idefics Support different image sizes in prefill in VLMs (#2065) 2024-06-17 10:49:41 +02:00
test_idefics2 Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_idefics3 Improve vlm support (add idefics3 support) (#2437) 2025-01-09 10:35:32 -05:00
test_llava_next Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_lora_mistral feat: simple mistral lora integration tests (#2180) 2024-07-15 09:16:15 -04:00
test_mamba All integration tests back everywhere (too many failed CI). (#2428) 2024-08-16 21:19:46 +02:00
test_mllama Update the flaky mllama test. (#3015) 2025-02-12 12:26:52 +01:00
test_mpt feat(server): Add Non flash MPT. (#514) 2023-07-03 13:01:46 +02:00
test_mt0_base Fixing mt0 test. (#2692) 2024-10-25 09:46:39 +02:00
test_neox Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_neox_sharded Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_server_gptq_quantized Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
test_smolvlm Improve vlm support (add idefics3 support) (#2437) 2025-01-09 10:35:32 -05:00
test_t5_sharded feat(server): support fp16 for t5 (#360) 2023-05-23 18:16:48 +02:00
test_tools_llama Fix tool call4 (#3094) 2025-03-12 09:28:47 +01:00
test_transformers_llama4 Add llama4 (#3145) 2025-04-06 10:20:22 +02:00
test_transformers_olmo Making sure Olmo (transformers backend) works. (#3074) 2025-03-05 17:46:47 +01:00
test.py Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00