From d59b4fdce9cb4df062ddae04a0f63e839d9f7a3e Mon Sep 17 00:00:00 2001
From: David Corvoysier <david@huggingface.co>
Date: Tue, 25 Feb 2025 10:48:17 +0000
Subject: [PATCH] doc(neuron): update links to installation page

---
 docs/source/_toctree.yml               | 2 +-
 docs/source/architecture.md            | 2 +-
 docs/source/installation_inferentia.md | 2 +-
 docs/source/multi_backend_support.md   | 1 +
 4 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
index 39f0ef4b..37b57d6f 100644
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@@ -12,7 +12,7 @@
   - local: installation_gaudi
     title: Using TGI with Intel Gaudi
   - local: installation_inferentia
-    title: Using TGI with AWS Inferentia
+    title: Using TGI with AWS Trainium and Inferentia
   - local: installation_tpu
     title: Using TGI with Google TPUs
   - local: installation_intel
diff --git a/docs/source/architecture.md b/docs/source/architecture.md
index d3a6fa92..b475bb6d 100644
--- a/docs/source/architecture.md
+++ b/docs/source/architecture.md
@@ -107,7 +107,7 @@ Several variants of the model server exist that are actively supported by Huggin
 - A [version optimized for AMD with ROCm](https://huggingface.co/docs/text-generation-inference/installation_amd) is hosted in the main TGI repository. Some model features differ.
 - A [version optimized for Intel GPUs](https://huggingface.co/docs/text-generation-inference/installation_intel) is hosted in the main TGI repository. Some model features differ.
 - The [version for Intel Gaudi](https://huggingface.co/docs/text-generation-inference/installation_gaudi) is maintained on a forked repository, often resynchronized with the main [TGI repository](https://github.com/huggingface/tgi-gaudi).
-- A [version for Neuron (AWS Inferentia2)](https://huggingface.co/docs/text-generation-inference/installation_inferentia) is maintained as part of [Optimum Neuron](https://github.com/huggingface/optimum-neuron/tree/main/text-generation-inference).
+- A [version for Neuron (AWS Inferentia2)](https://huggingface.co/docs/text-generation-inference/installation_inferentia) is maintained in the main TGI repository. Some model features differ.
 - A version for Google TPUs is maintained as part of [Optimum TPU](https://github.com/huggingface/optimum-tpu/tree/main/text-generation-inference).
 
 Not all variants provide the same features, as hardware and middleware capabilities do not provide the same optimizations.
diff --git a/docs/source/installation_inferentia.md b/docs/source/installation_inferentia.md
index 0394e6de..bfd0f657 100644
--- a/docs/source/installation_inferentia.md
+++ b/docs/source/installation_inferentia.md
@@ -1,3 +1,3 @@
 # Using TGI with Inferentia
 
-Check out this [guide](https://github.com/huggingface/optimum-neuron/tree/main/text-generation-inference) on how to serve models with TGI on Inferentia2.
+You can use TGI on AWS Trainium and Inferentia platforms using the [TGI neuron backend](https://huggingface.co/docs/text-generation-inference/backends/neuron).
diff --git a/docs/source/multi_backend_support.md b/docs/source/multi_backend_support.md
index 03d6d30b..997503a4 100644
--- a/docs/source/multi_backend_support.md
+++ b/docs/source/multi_backend_support.md
@@ -13,3 +13,4 @@ TGI remains consistent across backends, allowing you to switch between them seam
   However, it requires a model-specific compilation step for each GPU architecture.
 * **[TGI Llamacpp backend](./backends/llamacpp)**: This backend facilitates the deployment of large language models
   (LLMs) by integrating [llama.cpp][llama.cpp], an advanced inference engine optimized for both CPU and GPU computation.
+* **[TGI Neuron backend](./backends/neuron)**: This backend leverages the [AWS Neuron SDK](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) to allow the deployment of large language models (LLMs) on [AWS Trainium and Inferentia chips](https://aws.amazon.com/ai/machine-learning/trainium/).