text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-04-22 15:32:08 +00:00

History

OlivierDehaene 53ec0b790b feat(fp8): use fbgemm kernels and load fp8 weights directly (#2248 ) * feat(fp8): add support for fbgemm * allow loading fp8 weights directly * update outlines * fix makefile * build fbgemm * avoid circular import and fix dockerfile * add default dtype * refactored weights loader * fix auto conversion * fix quantization config parsing * force new nccl on install * missing get_weights implementation * increase timeout		2024-07-20 19:02:04 +02:00
..
__init__.py	Add support for Deepseek V2 (#2224 )	2024-07-19 17:23:20 +02:00
custom_autotune.py	Refactor layers. (#1866 )	2024-05-13 12:44:30 +02:00
exllama.py	Fix GPTQWeight import (#2020 )	2024-06-05 14:49:15 +02:00
exllamav2.py	feat(fp8): use fbgemm kernels and load fp8 weights directly (#2248 )	2024-07-20 19:02:04 +02:00
quant_linear.py	Refactor layers. (#1866 )	2024-05-13 12:44:30 +02:00
quantize.py	Use symmetric quantization in the `quantize` subcommand (#2120 )	2024-07-12 12:20:12 +02:00