mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-11-18 23:15:59 +00:00
Add support for GPTQ Marlin kernels GPTQ Marlin extends the Marlin kernels to support common GPTQ configurations: - bits: 4 or 8 - groupsize: -1, 32, 64, or 128 - desc_act: true/false Using the GPTQ Marlin kernels requires repacking the parameters in the Marlin quantizer format. The kernels were contributed by Neural Magic to VLLM. We vendor them here for convenience. |
||
|---|---|---|
| .. | ||
| __init__.pyi | ||
| ext.cpp | ||
| ext.hh | ||
| gptq_marlin_dtypes.cuh | ||
| gptq_marlin_repack.cu | ||
| gptq_marlin.cu | ||
| gptq_marlin.cuh | ||
| marlin_cuda_kernel.cu | ||
| py.typed | ||