mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-10-09 06:55:24 +00:00
This change adds support for 2:4 sparsity when using Marlin quantization. The 2:4 kernel is used when: * The quantizer is `marlin`; * the quantizer checkpoint format is `marlin_24`. Fixes #2098. |
||
|---|---|---|
| .. | ||
| adapters | ||
| layers | ||
| models | ||
| pb | ||
| utils | ||
| __init__.py | ||
| cache.py | ||
| cli.py | ||
| interceptor.py | ||
| server.py | ||
| tracing.py | ||