From 8f251c7c3a08894e97fe62194b6448c9e7c14b1b Mon Sep 17 00:00:00 2001 From: Merve Noyan Date: Thu, 24 Aug 2023 13:57:00 +0300 Subject: [PATCH] Update quantization.md --- docs/source/conceptual/quantization.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/source/conceptual/quantization.md b/docs/source/conceptual/quantization.md index 24ad1413..ef79d232 100644 --- a/docs/source/conceptual/quantization.md +++ b/docs/source/conceptual/quantization.md @@ -8,7 +8,8 @@ GPTQ is a post-training quantization method to make the model smaller. It quanti Given a layer \(l\) with weight matrix \(W_{l}\) and layer input \(X_{l}\), find quantized weight \(\hat{W}_{l}\): -$$\text{\hat{W}{l}}^{*} = argmin{\hat{W_{l}}} |W_{l}X-\hat{W}{l}X|^{2}{2}\) \right\}$$ +$${\hat{W}{l}}^{*} = argmin{\hat{W_{l}}} |W_{l}X-\hat{W}{l}X|^{2}{2}) \}$$ + TGI allows you to both run an already GPTQ quantized model (see available models [here](https://huggingface.co/models?search=gptq)) or quantize a model of your choice using quantization script by simply passing --quantize like below 👇