diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index 8fcba516..e073353f 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -52,6 +52,8 @@ - sections: - local: backends/trtllm title: TensorRT-LLM + - local: backends/llamacpp + title: Llamacpp title: Backends - sections: - local: reference/launcher diff --git a/docs/source/multi_backend_support.md b/docs/source/multi_backend_support.md index c4df15bc..03d6d30b 100644 --- a/docs/source/multi_backend_support.md +++ b/docs/source/multi_backend_support.md @@ -11,3 +11,5 @@ TGI remains consistent across backends, allowing you to switch between them seam * **[TGI TRTLLM backend](./backends/trtllm)**: This backend leverages NVIDIA's TensorRT library to accelerate LLM inference. It utilizes specialized optimizations and custom kernels for enhanced performance. However, it requires a model-specific compilation step for each GPU architecture. +* **[TGI Llamacpp backend](./backends/llamacpp)**: This backend facilitates the deployment of large language models + (LLMs) by integrating [llama.cpp][llama.cpp], an advanced inference engine optimized for both CPU and GPU computation.