Add --build-arg llamacpp_native & llamacpp_cpu_arm_arch

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
This commit is contained in:
Adrien Gallouët 2025-03-05 15:49:35 +00:00
parent 8a79cfd077
commit 3f7369d1c1
No known key found for this signature in database
2 changed files with 29 additions and 8 deletions

View File

@ -2,6 +2,8 @@ FROM nvidia/cuda:12.8.0-cudnn-devel-ubuntu24.04 AS deps
ARG llamacpp_version=b4827 ARG llamacpp_version=b4827
ARG llamacpp_cuda=OFF ARG llamacpp_cuda=OFF
ARG llamacpp_native=ON
ARG llamacpp_cpu_arm_arch=native
ARG cuda_arch=75-real;80-real;86-real;89-real;90-real ARG cuda_arch=75-real;80-real;86-real;89-real;90-real
WORKDIR /opt/src WORKDIR /opt/src
@ -28,6 +30,8 @@ RUN mkdir -p llama.cpp \
-DCMAKE_CXX_COMPILER=clang++ \ -DCMAKE_CXX_COMPILER=clang++ \
-DCMAKE_CUDA_ARCHITECTURES=${cuda_arch} \ -DCMAKE_CUDA_ARCHITECTURES=${cuda_arch} \
-DGGML_CUDA=${llamacpp_cuda} \ -DGGML_CUDA=${llamacpp_cuda} \
-DGGML_NATIVE=${llamacpp_native} \
-DGGML_CPU_ARM_ARCH=${llamacpp_cpu_arm_arch} \
-DLLAMA_BUILD_COMMON=OFF \ -DLLAMA_BUILD_COMMON=OFF \
-DLLAMA_BUILD_TESTS=OFF \ -DLLAMA_BUILD_TESTS=OFF \
-DLLAMA_BUILD_EXAMPLES=OFF \ -DLLAMA_BUILD_EXAMPLES=OFF \

View File

@ -25,9 +25,12 @@ You will find the best models on [Hugging Face][GGUF].
## Build Docker image ## Build Docker image
For optimal performance, the Docker image is compiled with native CPU For optimal performance, the Docker image is compiled with native CPU
instructions, thus it's highly recommended to execute the container on instructions by default. As a result, it is strongly recommended to run
the host used during the build process. Efforts are ongoing to enhance the container on the same host architecture used during the build
portability while maintaining high computational efficiency. process. Efforts are ongoing to improve portability across different
systems while preserving high computational efficiency.
To build the Docker image, use the following command:
```bash ```bash
docker build \ docker build \
@ -38,11 +41,25 @@ docker build \
### Build parameters ### Build parameters
| Parameter | Description | | Parameter (with --build-arg) | Description |
| ------------------------------------ | --------------------------------- | | ----------------------------------------- | -------------------------------- |
| `--build-arg llamacpp_version=bXXXX` | Specific version of llama.cpp | | `llamacpp_version=bXXXX` | Specific version of llama.cpp |
| `--build-arg llamacpp_cuda=ON` | Enables CUDA acceleration | | `llamacpp_cuda=ON` | Enables CUDA acceleration |
| `--build-arg cuda_arch=ARCH` | Defines target CUDA architecture | | `llamacpp_native=OFF` | Disable automatic CPU detection |
| `llamacpp_cpu_arm_arch=ARCH[+FEATURE]...` | Specific ARM CPU and features |
| `cuda_arch=ARCH` | Defines target CUDA architecture |
For example, to target Graviton4 when building on another ARM
architecture:
```bash
docker build \
-t tgi-llamacpp \
--build-arg llamacpp_native=OFF \
--build-arg llamacpp_cpu_arm_arch=armv9-a+i8mm \
https://github.com/huggingface/text-generation-inference.git \
-f Dockerfile_llamacpp
```
## Run Docker image ## Run Docker image