mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-21 23:12:07 +00:00
This change adds support for 2:4 sparsity when using Marlin quantization. The 2:4 kernel is used when: * The quantizer is `marlin`; * the quantizer checkpoint format is `marlin_24`. Fixes #2098. |
||
---|---|---|
.. | ||
sparse | ||
__init__.pyi | ||
ext.cpp | ||
ext.hh | ||
gptq_marlin_dtypes.cuh | ||
gptq_marlin_repack.cu | ||
gptq_marlin.cu | ||
gptq_marlin.cuh | ||
marlin_cuda_kernel.cu | ||
py.typed |