text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-11 07:55:24 +00:00

History

Daniël de Kok 32d50c2ea7 Add support for scalar FP8 weight scales (#2550 ) * Add support for scalar FP8 weight scales * Support LLM compressor FP8 checkpoints on H100 On H100, we use fbgemm-gpu, which requires bfloat16 as the input dtype. However, we wouldn't pick up fp8 quantization for models quantized with LLM compressor. This change adds enough parsing to detect if models have FP8-quantized weights. * Remove stray debug print		2024-10-25 09:01:04 +00:00
..
adapters	feat: add ruff and resolve issue (#2262 )	2024-09-25 05:46:24 +00:00
layers	Add support for scalar FP8 weight scales (#2550 )	2024-10-25 09:01:04 +00:00
models	Add missing import package	2024-10-25 08:52:24 +00:00
pb	chore: add pre-commit (#1569 )	2024-04-24 15:32:02 +03:00
utils	Micro cleanup. (#2555 )	2024-10-25 08:53:47 +00:00
__init__.py	feat(clients): Python client (#103 )	2023-03-07 18:52:22 +01:00
cache.py	fix(server): decrease memory fragmentation (#557 )	2023-07-06 14:28:33 +02:00
cli.py	Pass the max_batch_total_tokens to causal_lm	2024-10-23 08:28:26 +00:00
habana_quantization_env.py	Remove all references to habana_quantization_toolkit for 1.18 (#229 )	2024-10-18 10:59:59 +02:00
interceptor.py	Make Gaudi adapt to the tgi 2.3.0	2024-09-26 06:04:55 +00:00
server.py	Make Gaudi adapt to the tgi 2.3.0	2024-09-26 06:04:55 +00:00
tgi_service.py	Make Gaudi adapt to the tgi 2.3.0	2024-09-26 06:04:55 +00:00
tracing.py	Add OTLP Service Name Environment Variable (#2076 )	2024-09-24 03:51:26 +00:00