mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-23 16:02:10 +00:00
- The code is relatively easy (just disable the checks on Embedding and Head) This cannot be done in the same easy fashion for hidden_dim/head_dim. It's relatively easy on some models (classic MHA) but it would make the other models (MQA) much more complex, and GPTQ quantization another quite hairy piece of code. |
||
---|---|---|
.. | ||
gptq | ||
__init__.py | ||
convert.py | ||
dist.py | ||
hub.py | ||
layers.py | ||
logits_process.py | ||
tokens.py | ||
watermark.py | ||
weights.py |