text-generation-inference/server/text_generation_server/models
Daniël de Kok 84ab88d843
Support flashinfer for Gemma3 prefill (#3167)
* launcher: ensure correct detection of Gemma 3 head size

* Support flashinfer for Gemma3 prefill

Gemma3 uses bidirectional attention for images. Flashinfer
supports custom masks. Hook up the mask with flashinfer, so that we do
not have to use the slower SDPA implementation for prefills with images.

* Update Gemma3 test outputs

* Fixed unused import
2025-04-17 18:07:41 +02:00
..
custom_modeling Support flashinfer for Gemma3 prefill (#3167) 2025-04-17 18:07:41 +02:00
__init__.py transformers flash llm/vlm enabling in ipex (#3152) 2025-04-15 11:08:01 +02:00
bloom.py Refactor dead code - Removing all flash_xxx.py files. (#2166) 2024-07-05 10:29:56 +02:00
causal_lm.py Sync (most) server dependencies with Nix (#2782) 2024-12-03 04:04:06 +01:00
flash_causal_lm.py Support flashinfer for Gemma3 prefill (#3167) 2025-04-17 18:07:41 +02:00
galactica.py feat: add ruff and resolve issue (#2262) 2024-07-26 10:29:09 -04:00
globals.py Fixing the oom maybe with 2.5.1 change. (#2958) 2025-01-28 10:30:28 +01:00
idefics_causal_lm.py feat: prefill chunking (#2600) 2024-10-16 12:49:33 +02:00
mamba.py Choosing input/total tokens automatically based on available VRAM? (#2673) 2024-10-28 04:59:49 +01:00
metadata_kernels.py feat: add payload limit (#2726) 2024-11-21 18:20:15 +00:00
mllama_causal_lm.py Update transformers to 4.51 (#3148) 2025-04-07 12:55:43 +02:00
model.py Bug Fix: Sliding Window Attention (#3112) 2025-03-18 10:37:33 +01:00
pali_gemma.py feat: add ruff and resolve issue (#2262) 2024-07-26 10:29:09 -04:00
seq2seq_lm.py feat: prefill chunking (#2600) 2024-10-16 12:49:33 +02:00
transformers_flash_causal_lm.py transformers flash llm/vlm enabling in ipex (#3152) 2025-04-15 11:08:01 +02:00
transformers_flash_vlm.py transformers flash llm/vlm enabling in ipex (#3152) 2025-04-15 11:08:01 +02:00
types.py feat: prefill chunking (#2600) 2024-10-16 12:49:33 +02:00
vlm_causal_lm.py Support flashinfer for Gemma3 prefill (#3167) 2025-04-17 18:07:41 +02:00