text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-04-25 12:02:08 +00:00

History

Daniël de Kok 6652d6e6e0 Support flashinfer for Gemma3 prefill Gemma3 uses bidirectional attention for images. Flashinfer supports custom masks. Hook up the mask with flashinfer, so that we do not have to use the slower SDPA implementation for prefills with images.		2025-04-11 18:20:54 +00:00
..
attention	Support flashinfer for Gemma3 prefill	2025-04-11 18:20:54 +00:00
awq	fix incorrect output of Qwen2-7B-Instruct-GPTQ-Int4 and Qwen2-7B-Inst… (#2717 )	2024-11-04 16:07:51 +01:00
compressed_tensors	Use kernels from the kernel hub (#2988 )	2025-02-10 19:19:25 +01:00
gptq	Small test and typing fixes (#3078 )	2025-03-10 15:08:23 +01:00
marlin	Use kernels from the kernel hub (#2988 )	2025-02-10 19:19:25 +01:00
moe	some minor fix (#3048 )	2025-02-25 12:07:55 +01:00
__init__.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
bnb.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
conv.py	Refactor layers. (#1866 )	2024-05-13 12:44:30 +02:00
eetq.py	Use eetq kernel from the hub (#3029 )	2025-02-18 10:03:53 +01:00
exl2.py	Add support for Deepseek V2 (#2224 )	2024-07-19 17:23:20 +02:00
fp8.py	Use kernels from the kernel hub (#2988 )	2025-02-10 19:19:25 +01:00
layernorm.py	Update vllm kernels for ROCM (#2826 )	2024-12-18 12:44:42 +01:00
linear.py	Update vllm kernels for ROCM (#2826 )	2024-12-18 12:44:42 +01:00
lora.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
medusa.py	Prefix caching (#2402 )	2024-08-20 11:15:30 +02:00
mlp.py	Tied embeddings in MLP speculator. (#2473 )	2024-08-29 17:44:54 +02:00
rotary.py	Use `rotary` kernel from the Hub (#3041 )	2025-02-21 13:55:31 +01:00
speculative.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
tensor_parallel.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00