mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-11-18 15:05:58 +00:00
Add support for GPTQ Marlin kernels GPTQ Marlin extends the Marlin kernels to support common GPTQ configurations: - bits: 4 or 8 - groupsize: -1, 32, 64, or 128 - desc_act: true/false Using the GPTQ Marlin kernels requires repacking the parameters in the Marlin quantizer format. The kernels were contributed by Neural Magic to VLLM. We vendor them here for convenience. |
||
|---|---|---|
| .. | ||
| attention | ||
| awq | ||
| gptq | ||
| __init__.py | ||
| bnb.py | ||
| conv.py | ||
| eetq.py | ||
| exl2.py | ||
| fp8.py | ||
| layernorm.py | ||
| linear.py | ||
| marlin.py | ||
| medusa.py | ||
| mlp.py | ||
| rotary.py | ||
| speculative.py | ||
| tensor_parallel.py | ||