mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-09-10 20:04:52 +00:00
Desperate attempt to fix latex
This commit is contained in:
parent
b581eb7151
commit
2363e9a482
@ -8,7 +8,7 @@ GPTQ is a post-training quantization method to make the model smaller. It quanti
|
||||
|
||||
Given a layer \(l\) with weight matrix \(W_{l}\) and layer input \(X_{l}\), find quantized weight \(\hat{W}_{l}\):
|
||||
|
||||
\({\hat{W}{l}}^{*} = argmin{\hat{W_{l}}} |W_{l}X-\hat{W}{l}X|^{2}{2}\)
|
||||
$$\text{\hat{W}{l}}^{*} = argmin{\hat{W_{l}}} |W_{l}X-\hat{W}{l}X|^{2}{2}\) \right\}$$
|
||||
|
||||
TGI allows you to both run an already GPTQ quantized model (see available models [here](https://huggingface.co/models?search=gptq)) or quantize a model of your choice using quantization script by simply passing --quantize like below 👇
|
||||
|
||||
@ -34,4 +34,4 @@ text-generation-launcher --model-id /data/falcon-40b-gptq/ --sharded true --num-
|
||||
You can learn more about the quantization options by running `text-generation-server quantize --help`.
|
||||
|
||||
If you wish to do more with GPTQ models (e.g. train an adapter on top), you can read about transformers GPTQ integration [here](https://huggingface.co/blog/gptq-integration).
|
||||
You can learn more about GPTQ from the [paper](https://arxiv.org/pdf/2210.17323.pdf).
|
||||
You can learn more about GPTQ from the [paper](https://arxiv.org/pdf/2210.17323.pdf).
|
||||
|
Loading…
Reference in New Issue
Block a user