text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-04-22 15:32:08 +00:00

History

Daniël de Kok 6652d6e6e0 Support flashinfer for Gemma3 prefill Gemma3 uses bidirectional attention for images. Flashinfer supports custom masks. Hook up the mask with flashinfer, so that we do not have to use the slower SDPA implementation for prefills with images.		2025-04-11 18:20:54 +00:00
..
__init__.py	Add support for FP8 KV cache scales (#2628 )	2024-10-24 16:36:18 +02:00
common.py	feat: prefill chunking (#2600 )	2024-10-16 12:49:33 +02:00
cuda.py	Bug Fix: Sliding Window Attention (#3112 )	2025-03-18 10:37:33 +01:00
flash_attn_triton.py	feat: prefill chunking (#2600 )	2024-10-16 12:49:33 +02:00
flashinfer.py	Support flashinfer for Gemma3 prefill	2025-04-11 18:20:54 +00:00
ipex.py	Bug Fix: Sliding Window Attention (#3112 )	2025-03-18 10:37:33 +01:00
kv_cache.py	Use kernels from the kernel hub (#2988 )	2025-02-10 19:19:25 +01:00
rocm.py	Bug Fix: Sliding Window Attention (#3112 )	2025-03-18 10:37:33 +01:00