mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-09 06:55:24 +00:00

History

Daniël de Kok 32d50c2ea7 Add support for scalar FP8 weight scales (#2550 ) * Add support for scalar FP8 weight scales * Support LLM compressor FP8 checkpoints on H100 On H100, we use fbgemm-gpu, which requires bfloat16 as the input dtype. However, we wouldn't pick up fp8 quantization for models quantized with LLM compressor. This change adds enough parsing to detect if models have FP8-quantized weights. * Remove stray debug print		2024-10-25 09:01:04 +00:00
..
custom_kernels	All integration tests back everywhere (too many failed CI). (#2428 )	2024-09-25 06:10:59 +00:00
exllama_kernels	MI300 compatibility (#1764 )	2024-07-17 05:36:58 +00:00
exllamav2_kernels	chore: add pre-commit (#1569 )	2024-04-24 15:32:02 +03:00
tests	Fix tokenization yi (#2507 )	2024-09-25 06:15:35 +00:00
text_generation_server	Add support for scalar FP8 weight scales (#2550 )	2024-10-25 09:01:04 +00:00
.gitignore	Impl simple mamba model (#1480 )	2024-04-23 11:45:11 +03:00
dill-0.3.7-patch.sh	Make Gaudi adapt to the tgi 2.3.0	2024-09-26 06:04:55 +00:00
dill-0.3.8-patch.sh	Make Gaudi adapt to the tgi 2.3.0	2024-09-26 06:04:55 +00:00
Makefile	Make Gaudi adapt to the tgi 2.3.0	2024-09-26 06:04:55 +00:00
Makefile-awq	chore: add pre-commit (#1569 )	2024-04-24 15:32:02 +03:00
Makefile-eetq	Upgrade EETQ (Fixes the cuda graphs). (#1729 )	2024-04-25 17:58:27 +03:00
Makefile-exllamav2	Upgrading exl2. (#2415 )	2024-09-25 06:07:40 +00:00
Makefile-fbgemm	Add Directory Check to Prevent Redundant Cloning in Build Process (#2486 )	2024-09-25 06:14:07 +00:00
Makefile-flash-att	Hotfixing `make install`. (#2008 )	2024-09-24 03:29:29 +00:00
Makefile-flash-att-v2	Softcapping for gemma2. (#2273 )	2024-09-25 05:31:08 +00:00
Makefile-flashinfer	Prefix test - Different kind of load test to trigger prefix test bugs. (#2490 )	2024-09-25 06:14:07 +00:00
Makefile-lorax-punica	Enable multiple LoRa adapters (#2010 )	2024-09-24 03:55:04 +00:00
Makefile-selective-scan	chore: add pre-commit (#1569 )	2024-04-24 15:32:02 +03:00
Makefile-vllm	Add support for Deepseek V2 (#2224 )	2024-09-25 05:27:40 +00:00
poetry.lock	Update to moe-kenels 0.3.1 (#2535 )	2024-09-25 06:19:20 +00:00
pyproject.toml	Make Gaudi adapt to the tgi 2.3.0	2024-09-26 06:04:55 +00:00
README.md	chore: add pre-commit (#1569 )	2024-04-24 15:32:02 +03:00
requirements_cuda.txt	hotfix: add syrupy to the right subproject (#2499 )	2024-09-25 06:13:36 +00:00
requirements_intel.txt	hotfix: add syrupy to the right subproject (#2499 )	2024-09-25 06:13:36 +00:00
requirements_rocm.txt	hotfix: add syrupy to the right subproject (#2499 )	2024-09-25 06:13:36 +00:00
requirements.txt	Make Gaudi adapt to the tgi 2.3.0	2024-09-26 06:04:55 +00:00

README.md

Text Generation Inference Python gRPC Server

A Python gRPC server for Text Generation Inference

Install

make install

Run

make run-dev