mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-09-10 20:04:52 +00:00
Clarified flag
This commit is contained in:
parent
7f48a61bce
commit
5a5b4ef954
@ -11,7 +11,7 @@ Given a layer \\(l\\) with weight matrix \\(W_{l}\\) and layer input \\(X_{l}\\)
|
|||||||
$$({\hat{W}_{l}}^{*} = argmin_{\hat{W_{l}}} ||W_{l}X-\hat{W}_{l}X||^{2}_{2})$$
|
$$({\hat{W}_{l}}^{*} = argmin_{\hat{W_{l}}} ||W_{l}X-\hat{W}_{l}X||^{2}_{2})$$
|
||||||
|
|
||||||
|
|
||||||
TGI allows you to both run an already GPTQ quantized model (see available models [here](https://huggingface.co/models?search=gptq)) or quantize a model of your choice using quantization script by simply passing --quantize like below 👇
|
TGI allows you to both run an already GPTQ quantized model (see available models [here](https://huggingface.co/models?search=gptq)) or quantize a model of your choice using quantization script. You can run a quantized model by simply passing --quantize like below 👇
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model --quantize gptq
|
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model --quantize gptq
|
||||||
@ -19,7 +19,7 @@ docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingf
|
|||||||
|
|
||||||
Note that TGI's GPTQ implementation is different than [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ).
|
Note that TGI's GPTQ implementation is different than [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ).
|
||||||
|
|
||||||
To run quantization only with a calibration dataset, simply run
|
To quantize a given model using GPTQ with a calibration dataset, simply run
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
text-generation-server quantize tiiuae/falcon-40b /data/falcon-40b-gptq
|
text-generation-server quantize tiiuae/falcon-40b /data/falcon-40b-gptq
|
||||||
|
Loading…
Reference in New Issue
Block a user