text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-09 15:05:24 +00:00

History

Daniël de Kok f1f28404e7 Add support for GPTQ Marlin (#2052 ) Add support for GPTQ Marlin kernels GPTQ Marlin extends the Marlin kernels to support common GPTQ configurations: - bits: 4 or 8 - groupsize: -1, 32, 64, or 128 - desc_act: true/false Using the GPTQ Marlin kernels requires repacking the parameters in the Marlin quantizer format. The kernels were contributed by Neural Magic to VLLM. We vendor them here for convenience.		2024-09-24 03:43:30 +00:00
..
__init__.py	Aligin the source code with main branch 2.0.4	2024-09-24 03:06:55 +00:00
chunks.py	server: use chunked inputs	2024-09-24 03:42:29 +00:00
convert.py	Force weights_only (before fully breaking pickle files anyway). (#1710 )	2024-04-25 15:10:53 +03:00
dist.py	Aligin the source code with main branch 2.0.4	2024-09-24 03:06:55 +00:00
hub.py	Fixing the download strategy for ibm-fms (#1917 )	2024-07-17 05:36:58 +00:00
import_utils.py	Purely refactors paged/attention into `layers/attention` and make hardware differences more obvious with 1 file per hardware. (#1986 )	2024-09-24 03:19:39 +00:00
log.py	v1.3.4	2024-04-22 09:08:34 +03:00
logits_process.py	Aligin the source code with main branch 2.0.4	2024-09-24 03:06:55 +00:00
peft.py	fix: fix local loading for .bin models (#1419 )	2024-04-22 09:17:52 +03:00
speculate.py	chore: formatting	2024-04-18 16:26:00 +03:00
tokens.py	Aligin the source code with main branch 2.0.4	2024-09-24 03:06:55 +00:00
watermark.py	Aligin the source code with main branch 2.0.4	2024-09-24 03:06:55 +00:00
weights.py	Add support for GPTQ Marlin (#2052 )	2024-09-24 03:43:30 +00:00