mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-06-19 15:52:08 +00:00
Add --build-arg llamacpp_native & llamacpp_cpu_arm_arch
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
This commit is contained in:
parent
8a79cfd077
commit
3f7369d1c1
@ -2,6 +2,8 @@ FROM nvidia/cuda:12.8.0-cudnn-devel-ubuntu24.04 AS deps
|
|||||||
|
|
||||||
ARG llamacpp_version=b4827
|
ARG llamacpp_version=b4827
|
||||||
ARG llamacpp_cuda=OFF
|
ARG llamacpp_cuda=OFF
|
||||||
|
ARG llamacpp_native=ON
|
||||||
|
ARG llamacpp_cpu_arm_arch=native
|
||||||
ARG cuda_arch=75-real;80-real;86-real;89-real;90-real
|
ARG cuda_arch=75-real;80-real;86-real;89-real;90-real
|
||||||
|
|
||||||
WORKDIR /opt/src
|
WORKDIR /opt/src
|
||||||
@ -28,6 +30,8 @@ RUN mkdir -p llama.cpp \
|
|||||||
-DCMAKE_CXX_COMPILER=clang++ \
|
-DCMAKE_CXX_COMPILER=clang++ \
|
||||||
-DCMAKE_CUDA_ARCHITECTURES=${cuda_arch} \
|
-DCMAKE_CUDA_ARCHITECTURES=${cuda_arch} \
|
||||||
-DGGML_CUDA=${llamacpp_cuda} \
|
-DGGML_CUDA=${llamacpp_cuda} \
|
||||||
|
-DGGML_NATIVE=${llamacpp_native} \
|
||||||
|
-DGGML_CPU_ARM_ARCH=${llamacpp_cpu_arm_arch} \
|
||||||
-DLLAMA_BUILD_COMMON=OFF \
|
-DLLAMA_BUILD_COMMON=OFF \
|
||||||
-DLLAMA_BUILD_TESTS=OFF \
|
-DLLAMA_BUILD_TESTS=OFF \
|
||||||
-DLLAMA_BUILD_EXAMPLES=OFF \
|
-DLLAMA_BUILD_EXAMPLES=OFF \
|
||||||
|
@ -25,9 +25,12 @@ You will find the best models on [Hugging Face][GGUF].
|
|||||||
## Build Docker image
|
## Build Docker image
|
||||||
|
|
||||||
For optimal performance, the Docker image is compiled with native CPU
|
For optimal performance, the Docker image is compiled with native CPU
|
||||||
instructions, thus it's highly recommended to execute the container on
|
instructions by default. As a result, it is strongly recommended to run
|
||||||
the host used during the build process. Efforts are ongoing to enhance
|
the container on the same host architecture used during the build
|
||||||
portability while maintaining high computational efficiency.
|
process. Efforts are ongoing to improve portability across different
|
||||||
|
systems while preserving high computational efficiency.
|
||||||
|
|
||||||
|
To build the Docker image, use the following command:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
docker build \
|
docker build \
|
||||||
@ -38,11 +41,25 @@ docker build \
|
|||||||
|
|
||||||
### Build parameters
|
### Build parameters
|
||||||
|
|
||||||
| Parameter | Description |
|
| Parameter (with --build-arg) | Description |
|
||||||
| ------------------------------------ | --------------------------------- |
|
| ----------------------------------------- | -------------------------------- |
|
||||||
| `--build-arg llamacpp_version=bXXXX` | Specific version of llama.cpp |
|
| `llamacpp_version=bXXXX` | Specific version of llama.cpp |
|
||||||
| `--build-arg llamacpp_cuda=ON` | Enables CUDA acceleration |
|
| `llamacpp_cuda=ON` | Enables CUDA acceleration |
|
||||||
| `--build-arg cuda_arch=ARCH` | Defines target CUDA architecture |
|
| `llamacpp_native=OFF` | Disable automatic CPU detection |
|
||||||
|
| `llamacpp_cpu_arm_arch=ARCH[+FEATURE]...` | Specific ARM CPU and features |
|
||||||
|
| `cuda_arch=ARCH` | Defines target CUDA architecture |
|
||||||
|
|
||||||
|
For example, to target Graviton4 when building on another ARM
|
||||||
|
architecture:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker build \
|
||||||
|
-t tgi-llamacpp \
|
||||||
|
--build-arg llamacpp_native=OFF \
|
||||||
|
--build-arg llamacpp_cpu_arm_arch=armv9-a+i8mm \
|
||||||
|
https://github.com/huggingface/text-generation-inference.git \
|
||||||
|
-f Dockerfile_llamacpp
|
||||||
|
```
|
||||||
|
|
||||||
## Run Docker image
|
## Run Docker image
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user