mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-24 00:12:08 +00:00
Add --build-arg llamacpp_native & llamacpp_cpu_arm_arch
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
This commit is contained in:
parent
8a79cfd077
commit
3f7369d1c1
@ -2,6 +2,8 @@ FROM nvidia/cuda:12.8.0-cudnn-devel-ubuntu24.04 AS deps
|
||||
|
||||
ARG llamacpp_version=b4827
|
||||
ARG llamacpp_cuda=OFF
|
||||
ARG llamacpp_native=ON
|
||||
ARG llamacpp_cpu_arm_arch=native
|
||||
ARG cuda_arch=75-real;80-real;86-real;89-real;90-real
|
||||
|
||||
WORKDIR /opt/src
|
||||
@ -28,6 +30,8 @@ RUN mkdir -p llama.cpp \
|
||||
-DCMAKE_CXX_COMPILER=clang++ \
|
||||
-DCMAKE_CUDA_ARCHITECTURES=${cuda_arch} \
|
||||
-DGGML_CUDA=${llamacpp_cuda} \
|
||||
-DGGML_NATIVE=${llamacpp_native} \
|
||||
-DGGML_CPU_ARM_ARCH=${llamacpp_cpu_arm_arch} \
|
||||
-DLLAMA_BUILD_COMMON=OFF \
|
||||
-DLLAMA_BUILD_TESTS=OFF \
|
||||
-DLLAMA_BUILD_EXAMPLES=OFF \
|
||||
|
@ -25,9 +25,12 @@ You will find the best models on [Hugging Face][GGUF].
|
||||
## Build Docker image
|
||||
|
||||
For optimal performance, the Docker image is compiled with native CPU
|
||||
instructions, thus it's highly recommended to execute the container on
|
||||
the host used during the build process. Efforts are ongoing to enhance
|
||||
portability while maintaining high computational efficiency.
|
||||
instructions by default. As a result, it is strongly recommended to run
|
||||
the container on the same host architecture used during the build
|
||||
process. Efforts are ongoing to improve portability across different
|
||||
systems while preserving high computational efficiency.
|
||||
|
||||
To build the Docker image, use the following command:
|
||||
|
||||
```bash
|
||||
docker build \
|
||||
@ -38,11 +41,25 @@ docker build \
|
||||
|
||||
### Build parameters
|
||||
|
||||
| Parameter | Description |
|
||||
| ------------------------------------ | --------------------------------- |
|
||||
| `--build-arg llamacpp_version=bXXXX` | Specific version of llama.cpp |
|
||||
| `--build-arg llamacpp_cuda=ON` | Enables CUDA acceleration |
|
||||
| `--build-arg cuda_arch=ARCH` | Defines target CUDA architecture |
|
||||
| Parameter (with --build-arg) | Description |
|
||||
| ----------------------------------------- | -------------------------------- |
|
||||
| `llamacpp_version=bXXXX` | Specific version of llama.cpp |
|
||||
| `llamacpp_cuda=ON` | Enables CUDA acceleration |
|
||||
| `llamacpp_native=OFF` | Disable automatic CPU detection |
|
||||
| `llamacpp_cpu_arm_arch=ARCH[+FEATURE]...` | Specific ARM CPU and features |
|
||||
| `cuda_arch=ARCH` | Defines target CUDA architecture |
|
||||
|
||||
For example, to target Graviton4 when building on another ARM
|
||||
architecture:
|
||||
|
||||
```bash
|
||||
docker build \
|
||||
-t tgi-llamacpp \
|
||||
--build-arg llamacpp_native=OFF \
|
||||
--build-arg llamacpp_cpu_arm_arch=armv9-a+i8mm \
|
||||
https://github.com/huggingface/text-generation-inference.git \
|
||||
-f Dockerfile_llamacpp
|
||||
```
|
||||
|
||||
## Run Docker image
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user