mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-11-18 23:15:59 +00:00
This change adds support for 2:4 sparsity when using Marlin quantization. The 2:4 kernel is used when: * The quantizer is `marlin`; * the quantizer checkpoint format is `marlin_24`. Fixes #2098. |
||
|---|---|---|
| .. | ||
| merges | ||
| __init__.py | ||
| adapter.py | ||
| chunks.py | ||
| convert.py | ||
| dist.py | ||
| hub.py | ||
| import_utils.py | ||
| log.py | ||
| logits_process.py | ||
| peft.py | ||
| segments.py | ||
| sgmv.py | ||
| speculate.py | ||
| tokens.py | ||
| watermark.py | ||
| weights.py | ||