Add --build-arg llamacpp_native & llamacpp_cpu_arm_arch

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-09-19 08:14:52 +00:00 · 2025-03-05 15:49:35 +00:00 · 2025-03-05 15:49:35 +00:00 · 3f7369d1c1
commit 3f7369d1c1
parent 8a79cfd077
2 changed files with 29 additions and 8 deletions
--- a/4
+++ b/4
@ -2,6 +2,8 @@ FROM nvidia/cuda:12.8.0-cudnn-devel-ubuntu24.04 AS deps
 ARG llamacpp_version=b4827
 ARG llamacpp_cuda=OFF
 ARG llamacpp_native=ON
 ARG llamacpp_cpu_arm_arch=native
 ARG cuda_arch=75-real;80-real;86-real;89-real;90-real
 WORKDIR /opt/src
@ -28,6 +30,8 @@ RUN mkdir -p llama.cpp \
    -DCMAKE_CXX_COMPILER=clang++ \
    -DCMAKE_CUDA_ARCHITECTURES=${cuda_arch} \
    -DGGML_CUDA=${llamacpp_cuda} \
    -DGGML_NATIVE=${llamacpp_native} \
    -DGGML_CPU_ARM_ARCH=${llamacpp_cpu_arm_arch} \
    -DLLAMA_BUILD_COMMON=OFF \
    -DLLAMA_BUILD_TESTS=OFF \
    -DLLAMA_BUILD_EXAMPLES=OFF \
--- a/docs/source/backends/llamacpp.md
+++ b/docs/source/backends/llamacpp.md
@ -25,9 +25,12 @@ You will find the best models on [Hugging Face][GGUF].
 ## Build Docker image
 For optimal performance, the Docker image is compiled with native CPU
-instructions, thus it's highly recommended to execute the container on
+instructions by default. As a result, it is strongly recommended to run
-the host used during the build process. Efforts are ongoing to enhance
+the container on the same host architecture used during the build
-portability while maintaining high computational efficiency.
+process. Efforts are ongoing to improve portability across different
 systems while preserving high computational efficiency.
 To build the Docker image, use the following command:
 ```bash
 docker build \
@ -38,11 +41,25 @@ docker build \
 ### Build parameters
-| Parameter                            | Description                       |
+| Parameter (with --build-arg)              | Description                      |
-| ------------------------------------ | --------------------------------- |
+| ----------------------------------------- | -------------------------------- |
-| `--build-arg llamacpp_version=bXXXX` | Specific version of llama.cpp     |
+| `llamacpp_version=bXXXX`                  | Specific version of llama.cpp    |
-| `--build-arg llamacpp_cuda=ON`       | Enables CUDA acceleration         |
+| `llamacpp_cuda=ON`                        | Enables CUDA acceleration        |
-| `--build-arg cuda_arch=ARCH`         | Defines target CUDA architecture  |
+| `llamacpp_native=OFF`                     | Disable automatic CPU detection  |
 | `llamacpp_cpu_arm_arch=ARCH[+FEATURE]...` | Specific ARM CPU and features    |
 | `cuda_arch=ARCH`                          | Defines target CUDA architecture |
 For example, to target Graviton4 when building on another ARM
 architecture:
 ```bash
 docker build \
    -t tgi-llamacpp \
    --build-arg llamacpp_native=OFF \
    --build-arg llamacpp_cpu_arm_arch=armv9-a+i8mm \
    https://github.com/huggingface/text-generation-inference.git \
    -f Dockerfile_llamacpp
 ```
 ## Run Docker image