Add --build-arg llamacpp_native & llamacpp_cpu_arm_arch

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
This commit is contained in:
Adrien Gallouët 2025-03-05 15:49:35 +00:00
parent 8a79cfd077
commit 3f7369d1c1
No known key found for this signature in database
2 changed files with 29 additions and 8 deletions

View File

@ -2,6 +2,8 @@ FROM nvidia/cuda:12.8.0-cudnn-devel-ubuntu24.04 AS deps
ARG llamacpp_version=b4827
ARG llamacpp_cuda=OFF
ARG llamacpp_native=ON
ARG llamacpp_cpu_arm_arch=native
ARG cuda_arch=75-real;80-real;86-real;89-real;90-real
WORKDIR /opt/src
@ -28,6 +30,8 @@ RUN mkdir -p llama.cpp \
-DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_CUDA_ARCHITECTURES=${cuda_arch} \
-DGGML_CUDA=${llamacpp_cuda} \
-DGGML_NATIVE=${llamacpp_native} \
-DGGML_CPU_ARM_ARCH=${llamacpp_cpu_arm_arch} \
-DLLAMA_BUILD_COMMON=OFF \
-DLLAMA_BUILD_TESTS=OFF \
-DLLAMA_BUILD_EXAMPLES=OFF \

View File

@ -25,9 +25,12 @@ You will find the best models on [Hugging Face][GGUF].
## Build Docker image
For optimal performance, the Docker image is compiled with native CPU
instructions, thus it's highly recommended to execute the container on
the host used during the build process. Efforts are ongoing to enhance
portability while maintaining high computational efficiency.
instructions by default. As a result, it is strongly recommended to run
the container on the same host architecture used during the build
process. Efforts are ongoing to improve portability across different
systems while preserving high computational efficiency.
To build the Docker image, use the following command:
```bash
docker build \
@ -38,11 +41,25 @@ docker build \
### Build parameters
| Parameter | Description |
| ------------------------------------ | --------------------------------- |
| `--build-arg llamacpp_version=bXXXX` | Specific version of llama.cpp |
| `--build-arg llamacpp_cuda=ON` | Enables CUDA acceleration |
| `--build-arg cuda_arch=ARCH` | Defines target CUDA architecture |
| Parameter (with --build-arg) | Description |
| ----------------------------------------- | -------------------------------- |
| `llamacpp_version=bXXXX` | Specific version of llama.cpp |
| `llamacpp_cuda=ON` | Enables CUDA acceleration |
| `llamacpp_native=OFF` | Disable automatic CPU detection |
| `llamacpp_cpu_arm_arch=ARCH[+FEATURE]...` | Specific ARM CPU and features |
| `cuda_arch=ARCH` | Defines target CUDA architecture |
For example, to target Graviton4 when building on another ARM
architecture:
```bash
docker build \
-t tgi-llamacpp \
--build-arg llamacpp_native=OFF \
--build-arg llamacpp_cpu_arm_arch=armv9-a+i8mm \
https://github.com/huggingface/text-generation-inference.git \
-f Dockerfile_llamacpp
```
## Run Docker image