From 2363e9a4829f41b82726b6e9ed839cfeeee6f7b2 Mon Sep 17 00:00:00 2001 From: Merve Noyan Date: Thu, 24 Aug 2023 12:20:40 +0300 Subject: [PATCH] Desperate attempt to fix latex --- docs/source/conceptual/quantization.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/conceptual/quantization.md b/docs/source/conceptual/quantization.md index f70da03f..24ad1413 100644 --- a/docs/source/conceptual/quantization.md +++ b/docs/source/conceptual/quantization.md @@ -8,7 +8,7 @@ GPTQ is a post-training quantization method to make the model smaller. It quanti Given a layer \(l\) with weight matrix \(W_{l}\) and layer input \(X_{l}\), find quantized weight \(\hat{W}_{l}\): -\({\hat{W}{l}}^{*} = argmin{\hat{W_{l}}} |W_{l}X-\hat{W}{l}X|^{2}{2}\) +$$\text{\hat{W}{l}}^{*} = argmin{\hat{W_{l}}} |W_{l}X-\hat{W}{l}X|^{2}{2}\) \right\}$$ TGI allows you to both run an already GPTQ quantized model (see available models [here](https://huggingface.co/models?search=gptq)) or quantize a model of your choice using quantization script by simply passing --quantize like below 👇 @@ -34,4 +34,4 @@ text-generation-launcher --model-id /data/falcon-40b-gptq/ --sharded true --num- You can learn more about the quantization options by running `text-generation-server quantize --help`. If you wish to do more with GPTQ models (e.g. train an adapter on top), you can read about transformers GPTQ integration [here](https://huggingface.co/blog/gptq-integration). -You can learn more about GPTQ from the [paper](https://arxiv.org/pdf/2210.17323.pdf). \ No newline at end of file +You can learn more about GPTQ from the [paper](https://arxiv.org/pdf/2210.17323.pdf).