From 7388468e26246a0df681b378adc2c3424470e0e6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Adrien=20Gallou=C3=ABt?= Date: Fri, 14 Feb 2025 18:09:09 +0000 Subject: [PATCH] Update doc MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Signed-off-by: Adrien Gallouët --- docs/source/backends/llamacpp.md | 33 ++++++++++++++++++-------------- 1 file changed, 19 insertions(+), 14 deletions(-) diff --git a/docs/source/backends/llamacpp.md b/docs/source/backends/llamacpp.md index c19fd001..0b6f575f 100644 --- a/docs/source/backends/llamacpp.md +++ b/docs/source/backends/llamacpp.md @@ -44,20 +44,6 @@ docker build \ | `--build-arg llamacpp_cuda=ON` | Enables CUDA acceleration | | `--build-arg cuda_arch=ARCH` | Defines target CUDA architecture | -## Model preparation - -Retrieve a GGUF model and store it in a specific directory, for example: - -```bash -mkdir -p ~/models -cd ~/models -curl -LOJ "https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GGUF/resolve/main/qwen2.5-3b-instruct-q4_0.gguf?download=true" -``` - -GGUF files are optional as they will be automatically generated at -startup if not already present in the `models` directory. This means you -do not need to manually download a GGUF file unless you prefer to do so. - ## Run Docker image ### CPU-based inference @@ -84,6 +70,25 @@ docker run \ --model-id "Qwen/Qwen2.5-3B-Instruct" ``` +## Using a custom GGUF + +GGUF files are optional as they will be automatically generated at +startup if not already present in the `models` directory. However, if +the default GGUF generation is not suitable for your use case, you can +provide your own GGUF file with `--model-gguf`, for example: + +```bash +docker run \ + -p 3000:3000 \ + -e "HF_TOKEN=$HF_TOKEN" \ + -v "$HOME/models:/app/models" \ + tgi-llamacpp \ + --model-id "Qwen/Qwen2.5-3B-Instruct" \ + --model-gguf "models/qwen2.5-3b-instruct-q4_0.gguf" +``` + +Note that `--model-id` is still required. + ## Advanced parameters A full listing of configurable parameters is available in the `--help`: