mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-20 14:22:08 +00:00
This change adds support for 2:4 sparsity when using Marlin quantization. The 2:4 kernel is used when: * The quantizer is `marlin`; * the quantizer checkpoint format is `marlin_24`. Fixes #2098. |
||
---|---|---|
.. | ||
merges | ||
__init__.py | ||
adapter.py | ||
chunks.py | ||
convert.py | ||
dist.py | ||
hub.py | ||
import_utils.py | ||
log.py | ||
logits_process.py | ||
peft.py | ||
segments.py | ||
sgmv.py | ||
speculate.py | ||
tokens.py | ||
watermark.py | ||
weights.py |