text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-06-11 03:42:08 +00:00

History

Daniël de Kok c29dc89c18 Add support for scalar FP8 weight scales (#2550 ) * Add support for scalar FP8 weight scales * Support LLM compressor FP8 checkpoints on H100 On H100, we use fbgemm-gpu, which requires bfloat16 as the input dtype. However, we wouldn't pick up fp8 quantization for models quantized with LLM compressor. This change adds enough parsing to detect if models have FP8-quantized weights. * Remove stray debug print		2024-09-24 13:57:40 +02:00
..
attention	hotfix : enable intel ipex cpu and xpu in python3.11 (#2517 )	2024-09-12 17:23:49 +02:00
awq	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
gptq	Upgrading exl2. (#2415 )	2024-08-14 11:58:08 +02:00
marlin	Handle GPTQ-Marlin loading in `GPTQMarlinWeightLoader` (#2300 )	2024-07-31 13:08:41 +02:00
moe	Move to moe-kernels package and switch to common MoE layer (#2511 )	2024-09-17 18:08:58 +02:00
__init__.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
bnb.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
conv.py	Refactor layers. (#1866 )	2024-05-13 12:44:30 +02:00
eetq.py	feat(fp8): use fbgemm kernels and load fp8 weights directly (#2248 )	2024-07-20 19:02:04 +02:00
exl2.py	Add support for Deepseek V2 (#2224 )	2024-07-19 17:23:20 +02:00
fp8.py	Add support for scalar FP8 weight scales (#2550 )	2024-09-24 13:57:40 +02:00
layernorm.py	Removing IPEX_AVAIL. (#2115 )	2024-06-25 13:20:57 +02:00
linear.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
lora.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
medusa.py	Prefix caching (#2402 )	2024-08-20 11:15:30 +02:00
mlp.py	Tied embeddings in MLP speculator. (#2473 )	2024-08-29 17:44:54 +02:00
rotary.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
speculative.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
tensor_parallel.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00