text-generation-inference/backends/llamacpp
Adrien Gallouët 8fe851209c
Support HF_HUB_USER_AGENT_ORIGIN
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-03-05 11:08:17 +00:00
..
src Support HF_HUB_USER_AGENT_ORIGIN 2025-03-05 11:08:17 +00:00
build.rs Fix build with Mach-O 2025-03-05 11:08:17 +00:00
Cargo.toml Remove make-gguf.sh 2025-03-05 11:08:17 +00:00
README.md [Backend] Add Llamacpp backend (#2975) 2025-02-14 13:40:57 +01:00
requirements.txt Make --model-gguf optional 2025-03-05 11:08:17 +00:00

Llamacpp backend

If all your dependencies are installed at the system level, running cargo build should be sufficient. However, if you want to experiment with different versions of llama.cpp, some additional setup is required.

Install llama.cpp

LLAMACPP_PREFIX=$(pwd)/llama.cpp.out

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build \
    -DCMAKE_INSTALL_PREFIX="$LLAMACPP_PREFIX" \
    -DLLAMA_BUILD_COMMON=OFF \
    -DLLAMA_BUILD_TESTS=OFF \
    -DLLAMA_BUILD_EXAMPLES=OFF \
    -DLLAMA_BUILD_SERVER=OFF
cmake --build build --config Release -j
cmake --install build

Build TGI

PKG_CONFIG_PATH="$LLAMACPP_PREFIX/lib/pkgconfig" cargo build