text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-04-22 15:32:08 +00:00

History

Daniël de Kok 84ab88d843 Support flashinfer for Gemma3 prefill (#3167 ) * launcher: ensure correct detection of Gemma 3 head size * Support flashinfer for Gemma3 prefill Gemma3 uses bidirectional attention for images. Flashinfer supports custom masks. Hook up the mask with flashinfer, so that we do not have to use the slower SDPA implementation for prefills with images. * Update Gemma3 test outputs * Fixed unused import		2025-04-17 18:07:41 +02:00
..
adapters	Bug Fix: Sliding Window Attention (#3112 )	2025-03-18 10:37:33 +01:00
layers	Support flashinfer for Gemma3 prefill (#3167 )	2025-04-17 18:07:41 +02:00
models	Support flashinfer for Gemma3 prefill (#3167 )	2025-04-17 18:07:41 +02:00
pb	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
utils	transformers flash llm/vlm enabling in ipex (#3152 )	2025-04-15 11:08:01 +02:00
__init__.py	feat(clients): Python client (#103 )	2023-03-07 18:52:22 +01:00
cache.py	fix(server): decrease memory fragmentation (#557 )	2023-07-06 14:28:33 +02:00
cli.py	Fixing TRTLLM dockerfile. (#2922 )	2025-01-20 11:13:46 +01:00
interceptor.py	feat: prefill chunking (#2600 )	2024-10-16 12:49:33 +02:00
server.py	Tmp tp transformers (#2942 )	2025-01-23 18:07:30 +01:00
tracing.py	Add OTLP Service Name Environment Variable (#2076 )	2024-06-25 09:33:01 +02:00