mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-24 00:12:08 +00:00
doc(neuron): update links to installation page
This commit is contained in:
parent
e783f88dc5
commit
d59b4fdce9
@ -12,7 +12,7 @@
|
||||
- local: installation_gaudi
|
||||
title: Using TGI with Intel Gaudi
|
||||
- local: installation_inferentia
|
||||
title: Using TGI with AWS Inferentia
|
||||
title: Using TGI with AWS Trainium and Inferentia
|
||||
- local: installation_tpu
|
||||
title: Using TGI with Google TPUs
|
||||
- local: installation_intel
|
||||
|
@ -107,7 +107,7 @@ Several variants of the model server exist that are actively supported by Huggin
|
||||
- A [version optimized for AMD with ROCm](https://huggingface.co/docs/text-generation-inference/installation_amd) is hosted in the main TGI repository. Some model features differ.
|
||||
- A [version optimized for Intel GPUs](https://huggingface.co/docs/text-generation-inference/installation_intel) is hosted in the main TGI repository. Some model features differ.
|
||||
- The [version for Intel Gaudi](https://huggingface.co/docs/text-generation-inference/installation_gaudi) is maintained on a forked repository, often resynchronized with the main [TGI repository](https://github.com/huggingface/tgi-gaudi).
|
||||
- A [version for Neuron (AWS Inferentia2)](https://huggingface.co/docs/text-generation-inference/installation_inferentia) is maintained as part of [Optimum Neuron](https://github.com/huggingface/optimum-neuron/tree/main/text-generation-inference).
|
||||
- A [version for Neuron (AWS Inferentia2)](https://huggingface.co/docs/text-generation-inference/installation_inferentia) is maintained in the main TGI repository. Some model features differ.
|
||||
- A version for Google TPUs is maintained as part of [Optimum TPU](https://github.com/huggingface/optimum-tpu/tree/main/text-generation-inference).
|
||||
|
||||
Not all variants provide the same features, as hardware and middleware capabilities do not provide the same optimizations.
|
||||
|
@ -1,3 +1,3 @@
|
||||
# Using TGI with Inferentia
|
||||
|
||||
Check out this [guide](https://github.com/huggingface/optimum-neuron/tree/main/text-generation-inference) on how to serve models with TGI on Inferentia2.
|
||||
You can use TGI on AWS Trainium and Inferentia platforms using the [TGI neuron backend](https://huggingface.co/docs/text-generation-inference/backends/neuron).
|
||||
|
@ -13,3 +13,4 @@ TGI remains consistent across backends, allowing you to switch between them seam
|
||||
However, it requires a model-specific compilation step for each GPU architecture.
|
||||
* **[TGI Llamacpp backend](./backends/llamacpp)**: This backend facilitates the deployment of large language models
|
||||
(LLMs) by integrating [llama.cpp][llama.cpp], an advanced inference engine optimized for both CPU and GPU computation.
|
||||
* **[TGI Neuron backend](./backends/neuron)**: This backend leverages the [AWS Neuron SDK](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/) to allow the deployment of large language models (LLMs) on [AWS Trainium and Inferentia chips](https://aws.amazon.com/ai/machine-learning/trainium/).
|
||||
|
Loading…
Reference in New Issue
Block a user