mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-04-24 00:12:08 +00:00

History

Daniël de Kok 84ab88d843 Support flashinfer for Gemma3 prefill (#3167 ) * launcher: ensure correct detection of Gemma 3 head size * Support flashinfer for Gemma3 prefill Gemma3 uses bidirectional attention for images. Flashinfer supports custom masks. Hook up the mask with flashinfer, so that we do not have to use the slower SDPA implementation for prefills with images. * Update Gemma3 test outputs * Fixed unused import		2025-04-17 18:07:41 +02:00
..
custom_kernels	All integration tests back everywhere (too many failed CI). (#2428 )	2024-08-16 21:19:46 +02:00
exllama_kernels	Update ROCM libs and improvements (#2579 )	2024-09-30 10:54:32 +02:00
exllamav2_kernels	Update ROCM libs and improvements (#2579 )	2024-09-30 10:54:32 +02:00
tests	Small test and typing fixes (#3078 )	2025-03-10 15:08:23 +01:00
text_generation_server	Support flashinfer for Gemma3 prefill (#3167 )	2025-04-17 18:07:41 +02:00
.gitignore	Impl simple mamba model (#1480 )	2024-02-08 10:19:45 +01:00
bounds-from-nix.py	Sync (most) server dependencies with Nix (#2782 )	2024-12-03 04:04:06 +01:00
kernels.lock	Update to `kernels` 0.2.1 (#3084 )	2025-03-13 10:36:29 +01:00
Makefile	Update to `kernels` 0.2.1 (#3084 )	2025-03-13 10:36:29 +01:00
Makefile-awq	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
Makefile-eetq	Putting back the NCCL forced upgrade. (#2999 )	2025-02-14 11:31:59 +01:00
Makefile-exllamav2	Upgrading exl2. (#2415 )	2024-08-14 11:58:08 +02:00
Makefile-flash-att	Putting back the NCCL forced upgrade. (#2999 )	2025-02-14 11:31:59 +01:00
Makefile-flash-att-v2	Add Flash decoding kernel ROCm (#2855 )	2025-01-13 11:12:35 +01:00
Makefile-flashinfer	flashinfer 0.2.0.post1 -> post2 (#3040 )	2025-02-20 12:34:20 +01:00
Makefile-lorax-punica	Enable multiple LoRa adapters (#2010 )	2024-06-25 14:46:27 -04:00
Makefile-selective-scan	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
Makefile-vllm	Use ROCM 6.3.1 (#3141 )	2025-04-07 12:55:11 +02:00
pyproject.toml	Update transformers to 4.51 (#3148 )	2025-04-07 12:55:43 +02:00
README.md	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
req.txt	Using the "lockfile". (#2992 )	2025-02-06 12:28:24 +01:00
requirements_cuda.txt	Improve Transformers support (#2970 )	2025-02-18 19:04:34 +01:00
requirements_gen.txt	Improve Transformers support (#2970 )	2025-02-18 19:04:34 +01:00
requirements_intel.txt	Improve Transformers support (#2970 )	2025-02-18 19:04:34 +01:00
requirements_rocm.txt	Improve Transformers support (#2970 )	2025-02-18 19:04:34 +01:00
uv.lock	Update transformers to 4.51 (#3148 )	2025-04-07 12:55:43 +02:00

README.md

Text Generation Inference Python gRPC Server

A Python gRPC server for Text Generation Inference

Install

make install

Run

make run-dev