text-generation-inference/server/text_generation_server
Daniël de Kok 6652d6e6e0 Support flashinfer for Gemma3 prefill
Gemma3 uses bidirectional attention for images. Flashinfer
supports custom masks. Hook up the mask with flashinfer, so that we do
not have to use the slower SDPA implementation for prefills with images.
2025-04-11 18:20:54 +00:00
..
adapters Bug Fix: Sliding Window Attention (#3112) 2025-03-18 10:37:33 +01:00
layers Support flashinfer for Gemma3 prefill 2025-04-11 18:20:54 +00:00
models Support flashinfer for Gemma3 prefill 2025-04-11 18:20:54 +00:00
pb chore: add pre-commit (#1569) 2024-02-16 11:58:58 +01:00
utils xpu 2.6 update (#3051) 2025-03-17 13:48:48 +01:00
__init__.py feat(clients): Python client (#103) 2023-03-07 18:52:22 +01:00
cache.py fix(server): decrease memory fragmentation (#557) 2023-07-06 14:28:33 +02:00
cli.py Fixing TRTLLM dockerfile. (#2922) 2025-01-20 11:13:46 +01:00
interceptor.py feat: prefill chunking (#2600) 2024-10-16 12:49:33 +02:00
server.py Tmp tp transformers (#2942) 2025-01-23 18:07:30 +01:00
tracing.py Add OTLP Service Name Environment Variable (#2076) 2024-06-25 09:33:01 +02:00