From 7388468e26246a0df681b378adc2c3424470e0e6 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Adrien=20Gallou=C3=ABt?= <angt@huggingface.co>
Date: Fri, 14 Feb 2025 18:09:09 +0000
Subject: [PATCH] Update doc
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
---
 docs/source/backends/llamacpp.md | 33 ++++++++++++++++++--------------
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/docs/source/backends/llamacpp.md b/docs/source/backends/llamacpp.md
index c19fd001..0b6f575f 100644
--- a/docs/source/backends/llamacpp.md
+++ b/docs/source/backends/llamacpp.md
@@ -44,20 +44,6 @@ docker build \
 | `--build-arg llamacpp_cuda=ON`       | Enables CUDA acceleration         |
 | `--build-arg cuda_arch=ARCH`         | Defines target CUDA architecture  |
 
-## Model preparation
-
-Retrieve a GGUF model and store it in a specific directory, for example:
-
-```bash
-mkdir -p ~/models
-cd ~/models
-curl -LOJ "https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GGUF/resolve/main/qwen2.5-3b-instruct-q4_0.gguf?download=true"
-```
-
-GGUF files are optional as they will be automatically generated at
-startup if not already present in the `models` directory. This means you
-do not need to manually download a GGUF file unless you prefer to do so.
-
 ## Run Docker image
 
 ### CPU-based inference
@@ -84,6 +70,25 @@ docker run \
     --model-id "Qwen/Qwen2.5-3B-Instruct"
 ```
 
+## Using a custom GGUF
+
+GGUF files are optional as they will be automatically generated at
+startup if not already present in the `models` directory. However, if
+the default GGUF generation is not suitable for your use case, you can
+provide your own GGUF file with `--model-gguf`, for example:
+
+```bash
+docker run \
+    -p 3000:3000 \
+    -e "HF_TOKEN=$HF_TOKEN" \
+    -v "$HOME/models:/app/models" \
+    tgi-llamacpp \
+    --model-id "Qwen/Qwen2.5-3B-Instruct" \
+    --model-gguf "models/qwen2.5-3b-instruct-q4_0.gguf"
+```
+
+Note that `--model-id` is still required.
+
 ## Advanced parameters
 
 A full listing of configurable parameters is available in the `--help`: