Update doc

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-07-14 20:00:17 +00:00 · 2025-02-14 18:09:09 +00:00 · 2025-02-14 18:09:09 +00:00 · 7388468e26
commit 7388468e26
parent 0d01a89f0f
1 changed files with 19 additions and 14 deletions
--- a/docs/source/backends/llamacpp.md
+++ b/docs/source/backends/llamacpp.md
@ -44,20 +44,6 @@ docker build \
 | `--build-arg llamacpp_cuda=ON`       | Enables CUDA acceleration         |
 | `--build-arg cuda_arch=ARCH`         | Defines target CUDA architecture  |
 ## Model preparation
 Retrieve a GGUF model and store it in a specific directory, for example:
 ```bash
 mkdir -p ~/models
 cd ~/models
 curl -LOJ "https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GGUF/resolve/main/qwen2.5-3b-instruct-q4_0.gguf?download=true"
 ```
 GGUF files are optional as they will be automatically generated at
 startup if not already present in the `models` directory. This means you
 do not need to manually download a GGUF file unless you prefer to do so.
 ## Run Docker image
 ### CPU-based inference
@ -84,6 +70,25 @@ docker run \
    --model-id "Qwen/Qwen2.5-3B-Instruct"
 ```
 ## Using a custom GGUF
 GGUF files are optional as they will be automatically generated at
 startup if not already present in the `models` directory. However, if
 the default GGUF generation is not suitable for your use case, you can
 provide your own GGUF file with `--model-gguf`, for example:
 ```bash
 docker run \
    -p 3000:3000 \
    -e "HF_TOKEN=$HF_TOKEN" \
    -v "$HOME/models:/app/models" \
    tgi-llamacpp \
    --model-id "Qwen/Qwen2.5-3B-Instruct" \
    --model-gguf "models/qwen2.5-3b-instruct-q4_0.gguf"
 ```
 Note that `--model-id` is still required.
 ## Advanced parameters
 A full listing of configurable parameters is available in the `--help`: