From 704cd18402dabac03f49ecd56c2c8d00c0e10c93 Mon Sep 17 00:00:00 2001
From: Merve Noyan <merveenoyan@gmail.com>
Date: Fri, 8 Sep 2023 13:01:58 +0200
Subject: [PATCH] Iterated on Pedro's comments

---
 docs/source/conceptual/quantization.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/source/conceptual/quantization.md b/docs/source/conceptual/quantization.md
index d6f96751..1a44e3c2 100644
--- a/docs/source/conceptual/quantization.md
+++ b/docs/source/conceptual/quantization.md
@@ -4,7 +4,7 @@ TGI offers GPTQ and bits-and-bytes quantization to quantize large language model
 
 ## Quantization with GPTQ
 
-GPTQ is a post-training quantization method to make the model smaller. It quantizes each weight by finding a compressed version of that weight, that will yield a minimum mean squared error like below 👇 
+GPTQ is a post-training quantization method to make the model smaller. It quantizes the layers by finding a compressed version of that weight, that will yield a minimum mean squared error like below 👇 
 
 Given a layer \\(l\\) with weight matrix \\(W_{l}\\) and layer input \\(X_{l}\\), find quantized weight \\(\\hat{W}_{l}\\):
 
@@ -17,7 +17,7 @@ TGI allows you to both run an already GPTQ quantized model (see available models
 docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id $model --quantize gptq
 ```
 
-Note that TGI's GPTQ implementation is different than [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ). 
+Note that TGI's GPTQ implementation doesn't use [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) under the hood. However, models quantized using AutoGPTQ or Optimum can still be served by TGI. 
 
 To quantize a given model using GPTQ with a calibration dataset, simply run