doc(neuron): update links to installation page

2025-07-13 03:10:17 +00:00 · 2025-02-25 10:48:17 +00:00 · 2025-02-25 10:48:17 +00:00 · d59b4fdce9
commit d59b4fdce9
parent e783f88dc5
4 changed files with 4 additions and 3 deletions
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@ -12,7 +12,7 @@
  - local: installation_gaudi
    title: Using TGI with Intel Gaudi
  - local: installation_inferentia
-    title: Using TGI with AWS Inferentia
+    title: Using TGI with AWS Trainium and Inferentia
  - local: installation_tpu
    title: Using TGI with Google TPUs
  - local: installation_intel
--- a/docs/source/architecture.md
+++ b/docs/source/architecture.md
@ -107,7 +107,7 @@ Several variants of the model server exist that are actively supported by Huggin
 - A [version optimized for AMD with ROCm](https://huggingface.co/docs/text-generation-inference/installation_amd) is hosted in the main TGI repository. Some model features differ.
 - A [version optimized for Intel GPUs](https://huggingface.co/docs/text-generation-inference/installation_intel) is hosted in the main TGI repository. Some model features differ.
 - The [version for Intel Gaudi](https://huggingface.co/docs/text-generation-inference/installation_gaudi) is maintained on a forked repository, often resynchronized with the main [TGI repository](https://github.com/huggingface/tgi-gaudi).
- A [version for Neuron (AWS Inferentia2)](https://huggingface.co/docs/text-generation-inference/installation_inferentia) is maintained as part of [Optimum Neuron](https://github.com/huggingface/optimum-neuron/tree/main/text-generation-inference).
+- A [version for Neuron (AWS Inferentia2)](https://huggingface.co/docs/text-generation-inference/installation_inferentia) is maintained in the main TGI repository. Some model features differ.
 - A version for Google TPUs is maintained as part of [Optimum TPU](https://github.com/huggingface/optimum-tpu/tree/main/text-generation-inference).

 Not all variants provide the same features, as hardware and middleware capabilities do not provide the same optimizations.
--- a/docs/source/installation_inferentia.md
+++ b/docs/source/installation_inferentia.md
@ -1,3 +1,3 @@
 # Using TGI with Inferentia

-Check out this [guide](https://github.com/huggingface/optimum-neuron/tree/main/text-generation-inference) on how to serve models with TGI on Inferentia2.
+You can use TGI on AWS Trainium and Inferentia platforms using the [TGI neuron backend](https://huggingface.co/docs/text-generation-inference/backends/neuron).
--- a/docs/source/multi_backend_support.md
+++ b/docs/source/multi_backend_support.md
@ -13,3 +13,4 @@ TGI remains consistent across backends, allowing you to switch between them seam
  However, it requires a model-specific compilation step for each GPU architecture.
 * **[TGI Llamacpp backend](./backends/llamacpp)**: This backend facilitates the deployment of large language models
  (LLMs) by integrating [llama.cpp][llama.cpp], an advanced inference engine optimized for both CPU and GPU computation.
+* **[TGI Neuron backend](./backends/neuron)**: This backend leverages the [AWS Neuron SDK](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) to allow the deployment of large language models (LLMs) on [AWS Trainium and Inferentia chips](https://aws.amazon.com/ai/machine-learning/trainium/).