text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-09 23:15:23 +00:00

History

Daniël de Kok c29dc89c18 Add support for scalar FP8 weight scales (#2550 ) * Add support for scalar FP8 weight scales * Support LLM compressor FP8 checkpoints on H100 On H100, we use fbgemm-gpu, which requires bfloat16 as the input dtype. However, we wouldn't pick up fp8 quantization for models quantized with LLM compressor. This change adds enough parsing to detect if models have FP8-quantized weights. * Remove stray debug print		2024-09-24 13:57:40 +02:00
..
custom_modeling	hotfix: ipex fails since cuda moe kernel is not supported (#2532 )	2024-09-20 10:02:55 +02:00
__init__.py	Add support for scalar FP8 weight scales (#2550 )	2024-09-24 13:57:40 +02:00
bloom.py	Refactor dead code - Removing all `flash_xxx.py` files. (#2166 )	2024-07-05 10:29:56 +02:00
causal_lm.py	Fixing exl2 and other quanize tests again. (#2419 )	2024-08-15 11:12:51 +02:00
flash_causal_lm.py	hotfix : enable intel ipex cpu and xpu in python3.11 (#2517 )	2024-09-12 17:23:49 +02:00
galactica.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
globals.py	Lots of improvements (Still 2 allocators) (#2449 )	2024-08-29 16:29:01 +02:00
idefics_causal_lm.py	Upgrading exl2. (#2415 )	2024-08-14 11:58:08 +02:00
idefics.py	Upgrading exl2. (#2415 )	2024-08-14 11:58:08 +02:00
mamba.py	Fixing exl2 and other quanize tests again. (#2419 )	2024-08-15 11:12:51 +02:00
model.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
pali_gemma.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
seq2seq_lm.py	Fixing exl2 and other quanize tests again. (#2419 )	2024-08-15 11:12:51 +02:00
types.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
vlm_causal_lm.py	Prefix test - Different kind of load test to trigger prefix test bugs. (#2490 )	2024-09-11 18:10:40 +02:00