mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-22 15:32:08 +00:00
Packing of asymmetric quantization is broken, all (q)zeros values of `0` get reset to `1`, resulting in a loss of accuracy. So instead use symmetric quantization. To be able to distinguish models with symmetric and asymmetric quantization, a new config tensor `gptq_sym` is added. If this tensor is not present, we assume `sym=False`. |
||
---|---|---|
.. | ||
attention | ||
awq | ||
gptq | ||
__init__.py | ||
bnb.py | ||
conv.py | ||
eetq.py | ||
exl2.py | ||
fp8.py | ||
layernorm.py | ||
linear.py | ||
lora.py | ||
marlin.py | ||
medusa.py | ||
mlp.py | ||
rotary.py | ||
speculative.py | ||
tensor_parallel.py |