mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-09-11 12:24:53 +00:00
Update docs
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
This commit is contained in:
parent
2b0d99c1cf
commit
8bc10d37ee
@ -52,6 +52,8 @@
|
|||||||
- sections:
|
- sections:
|
||||||
- local: backends/trtllm
|
- local: backends/trtllm
|
||||||
title: TensorRT-LLM
|
title: TensorRT-LLM
|
||||||
|
- local: backends/llamacpp
|
||||||
|
title: Llamacpp
|
||||||
title: Backends
|
title: Backends
|
||||||
- sections:
|
- sections:
|
||||||
- local: reference/launcher
|
- local: reference/launcher
|
||||||
|
@ -11,3 +11,5 @@ TGI remains consistent across backends, allowing you to switch between them seam
|
|||||||
* **[TGI TRTLLM backend](./backends/trtllm)**: This backend leverages NVIDIA's TensorRT library to accelerate LLM inference.
|
* **[TGI TRTLLM backend](./backends/trtllm)**: This backend leverages NVIDIA's TensorRT library to accelerate LLM inference.
|
||||||
It utilizes specialized optimizations and custom kernels for enhanced performance.
|
It utilizes specialized optimizations and custom kernels for enhanced performance.
|
||||||
However, it requires a model-specific compilation step for each GPU architecture.
|
However, it requires a model-specific compilation step for each GPU architecture.
|
||||||
|
* **[TGI Llamacpp backend](./backends/llamacpp)**: This backend facilitates the deployment of large language models
|
||||||
|
(LLMs) by integrating [llama.cpp][llama.cpp], an advanced inference engine optimized for both CPU and GPU computation.
|
||||||
|
Loading…
Reference in New Issue
Block a user