text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-19 03:45:22 +00:00

History

Daniël de Kok 84ab88d843 Support flashinfer for Gemma3 prefill (#3167 ) * launcher: ensure correct detection of Gemma 3 head size * Support flashinfer for Gemma3 prefill Gemma3 uses bidirectional attention for images. Flashinfer supports custom masks. Hook up the mask with flashinfer, so that we do not have to use the slower SDPA implementation for prefills with images. * Update Gemma3 test outputs * Fixed unused import		2025-04-17 18:07:41 +02:00
..
gemma3	Bug Fix: Sliding Window Attention (#3112 )	2025-03-18 10:37:33 +01:00
__init__.py	feat(server): flash santacoder (#153 )	2023-04-03 19:06:42 +02:00
bloom_modeling.py	Fixing auto bloom test. (#2699 )	2024-10-28 06:14:11 +01:00
clip.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
flash_cohere_modeling.py	Use `rotary` kernel from the Hub (#3041 )	2025-02-21 13:55:31 +01:00
flash_dbrx_modeling.py	Use kernels from the kernel hub (#2988 )	2025-02-10 19:19:25 +01:00
flash_deepseek_v2_modeling.py	Update vllm kernels for ROCM (#2826 )	2024-12-18 12:44:42 +01:00
flash_deepseek_v3_modeling.py	Add deepseekv3 (#2968 )	2025-01-30 16:40:25 +01:00
flash_gemma2_modeling.py	Bug Fix: Sliding Window Attention (#3112 )	2025-03-18 10:37:33 +01:00
flash_gemma3_modeling.py	Support flashinfer for Gemma3 prefill (#3167 )	2025-04-17 18:07:41 +02:00
flash_gemma_modeling.py	Add support for FP8 KV cache scales (#2628 )	2024-10-24 16:36:18 +02:00
flash_gpt2_modeling.py	Add support for FP8 KV cache scales (#2628 )	2024-10-24 16:36:18 +02:00
flash_gptj_modeling.py	Use `rotary` kernel from the Hub (#3041 )	2025-02-21 13:55:31 +01:00
flash_llama_modeling.py	fix the crash of meta-llama/Llama-3.2-1B (#2918 )	2025-01-17 15:50:58 +01:00
flash_mistral_modeling.py	Bug Fix: Sliding Window Attention (#3112 )	2025-03-18 10:37:33 +01:00
flash_mixtral_modeling.py	Bug Fix: Sliding Window Attention (#3112 )	2025-03-18 10:37:33 +01:00
flash_neox_modeling.py	Add support for FP8 KV cache scales (#2628 )	2024-10-24 16:36:18 +02:00
flash_pali_gemma_modeling.py	Bug Fix: Sliding Window Attention (#3112 )	2025-03-18 10:37:33 +01:00
flash_phi_modeling.py	Add support for FP8 KV cache scales (#2628 )	2024-10-24 16:36:18 +02:00
flash_phi_moe_modeling.py	feat: support phi3.5 moe (#2479 )	2024-09-30 11:15:09 +02:00
flash_qwen2_modeling.py	Bug Fix: Sliding Window Attention (#3112 )	2025-03-18 10:37:33 +01:00
flash_rw_modeling.py	Using both value from config as they might not be correct. (#2817 )	2024-12-10 19:37:09 +01:00
flash_santacoder_modeling.py	Add support for FP8 KV cache scales (#2628 )	2024-10-24 16:36:18 +02:00
flash_starcoder2_modeling.py	Bug Fix: Sliding Window Attention (#3112 )	2025-03-18 10:37:33 +01:00
idefics2.py	Support qwen2 vl (#2689 )	2024-10-30 12:40:51 -04:00
idefics3.py	Improve vlm support (add idefics3 support) (#2437 )	2025-01-09 10:35:32 -05:00
idefics_config.py	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
idefics_image_processing.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
idefics_modeling.py	Update vllm kernels for ROCM (#2826 )	2024-12-18 12:44:42 +01:00
idefics_perceiver.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
idefics_processing.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
idefics_vision.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
llava_next.py	Support qwen2 vl (#2689 )	2024-10-30 12:40:51 -04:00
mamba_modeling.py	Fix: Change embeddings to embedding (#2738 )	2024-11-15 13:16:15 +01:00
mllama.py	feat: prefill chunking (#2600 )	2024-10-16 12:49:33 +02:00
mpt_modeling.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
neox_modeling.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
opt_modeling.py	Fixup opt to reduce the amount of odd if statements. (#2833 )	2024-12-12 18:20:13 +01:00
phi_modeling.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
qwen2_5_vl.py	Bug Fix: Sliding Window Attention (#3112 )	2025-03-18 10:37:33 +01:00
qwen2_vl.py	Fix tool call3 (#3086 )	2025-03-12 09:22:53 +01:00
siglip.py	Fix: don't apply post layernorm in SiglipVisionTransformer (#2459 )	2024-08-26 17:04:46 -04:00
t5_modeling.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
vlm.py	Add gemma3 model (#3099 )	2025-03-12 09:25:51 +01:00