From d59b4fdce9cb4df062ddae04a0f63e839d9f7a3e Mon Sep 17 00:00:00 2001 From: David Corvoysier Date: Tue, 25 Feb 2025 10:48:17 +0000 Subject: [PATCH] doc(neuron): update links to installation page --- docs/source/_toctree.yml | 2 +- docs/source/architecture.md | 2 +- docs/source/installation_inferentia.md | 2 +- docs/source/multi_backend_support.md | 1 + 4 files changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index 39f0ef4b..37b57d6f 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -12,7 +12,7 @@ - local: installation_gaudi title: Using TGI with Intel Gaudi - local: installation_inferentia - title: Using TGI with AWS Inferentia + title: Using TGI with AWS Trainium and Inferentia - local: installation_tpu title: Using TGI with Google TPUs - local: installation_intel diff --git a/docs/source/architecture.md b/docs/source/architecture.md index d3a6fa92..b475bb6d 100644 --- a/docs/source/architecture.md +++ b/docs/source/architecture.md @@ -107,7 +107,7 @@ Several variants of the model server exist that are actively supported by Huggin - A [version optimized for AMD with ROCm](https://huggingface.co/docs/text-generation-inference/installation_amd) is hosted in the main TGI repository. Some model features differ. - A [version optimized for Intel GPUs](https://huggingface.co/docs/text-generation-inference/installation_intel) is hosted in the main TGI repository. Some model features differ. - The [version for Intel Gaudi](https://huggingface.co/docs/text-generation-inference/installation_gaudi) is maintained on a forked repository, often resynchronized with the main [TGI repository](https://github.com/huggingface/tgi-gaudi). -- A [version for Neuron (AWS Inferentia2)](https://huggingface.co/docs/text-generation-inference/installation_inferentia) is maintained as part of [Optimum Neuron](https://github.com/huggingface/optimum-neuron/tree/main/text-generation-inference). +- A [version for Neuron (AWS Inferentia2)](https://huggingface.co/docs/text-generation-inference/installation_inferentia) is maintained in the main TGI repository. Some model features differ. - A version for Google TPUs is maintained as part of [Optimum TPU](https://github.com/huggingface/optimum-tpu/tree/main/text-generation-inference). Not all variants provide the same features, as hardware and middleware capabilities do not provide the same optimizations. diff --git a/docs/source/installation_inferentia.md b/docs/source/installation_inferentia.md index 0394e6de..bfd0f657 100644 --- a/docs/source/installation_inferentia.md +++ b/docs/source/installation_inferentia.md @@ -1,3 +1,3 @@ # Using TGI with Inferentia -Check out this [guide](https://github.com/huggingface/optimum-neuron/tree/main/text-generation-inference) on how to serve models with TGI on Inferentia2. +You can use TGI on AWS Trainium and Inferentia platforms using the [TGI neuron backend](https://huggingface.co/docs/text-generation-inference/backends/neuron). diff --git a/docs/source/multi_backend_support.md b/docs/source/multi_backend_support.md index 03d6d30b..997503a4 100644 --- a/docs/source/multi_backend_support.md +++ b/docs/source/multi_backend_support.md @@ -13,3 +13,4 @@ TGI remains consistent across backends, allowing you to switch between them seam However, it requires a model-specific compilation step for each GPU architecture. * **[TGI Llamacpp backend](./backends/llamacpp)**: This backend facilitates the deployment of large language models (LLMs) by integrating [llama.cpp][llama.cpp], an advanced inference engine optimized for both CPU and GPU computation. +* **[TGI Neuron backend](./backends/neuron)**: This backend leverages the [AWS Neuron SDK](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) to allow the deployment of large language models (LLMs) on [AWS Trainium and Inferentia chips](https://aws.amazon.com/ai/machine-learning/trainium/).