Add section about TGI on Gaudi in README

2025-09-10 11:54:52 +00:00 · 2023-07-27 22:40:53 +02:00 · 2023-07-27 22:40:53 +02:00 · eba543222b
commit eba543222b
parent 9f18f4c006
1 changed files with 13 additions and 7 deletions
--- a/README.md
+++ b/README.md
@ -15,7 +15,7 @@
 </a>
 </div>
-A Rust, Python and gRPC server for text generation inference. Used in production at [HuggingFace](https://huggingface.co) 
+A Rust, Python and gRPC server for text generation inference. Used in production at [HuggingFace](https://huggingface.co)
 to power LLMs api-inference widgets.
 ## Table of contents
@ -135,7 +135,7 @@ The Swagger UI is also available at: [https://huggingface.github.io/text-generat
 ### Using a private or gated model
-You have the option to utilize the `HUGGING_FACE_HUB_TOKEN` environment variable for configuring the token employed by 
+You have the option to utilize the `HUGGING_FACE_HUB_TOKEN` environment variable for configuring the token employed by
 `text-generation-inference`. This allows you to gain access to protected resources.
 For example, if you want to serve the gated Llama V2 model variants:
@ -146,7 +146,7 @@ For example, if you want to serve the gated Llama V2 model variants:
 or with Docker:
-```shell 
+```shell
 model=meta-llama/Llama-2-7b-chat-hf
 volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
 token=<your cli READ token>
@ -195,7 +195,7 @@ Python 3.9, e.g. using `conda`:
 ```shell
 curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
-conda create -n text-generation-inference python=3.9 
+conda create -n text-generation-inference python=3.9
 conda activate text-generation-inference
 ```
@ -221,7 +221,7 @@ Then run:
 ```shell
 BUILD_EXTENSIONS=True make install # Install repository and HF/transformer fork with CUDA kernels
-make run-falcon-7b-instruct 
+make run-falcon-7b-instruct
 ```
 **Note:** on some machines, you may also need the OpenSSL libraries and gcc. On Linux machines, run:
@ -232,7 +232,7 @@ sudo apt-get install libssl-dev gcc -y
 ### CUDA Kernels
-The custom CUDA kernels are only tested on NVIDIA A100s. If you have any installation or runtime issues, you can remove 
+The custom CUDA kernels are only tested on NVIDIA A100s. If you have any installation or runtime issues, you can remove
 the kernels by using the `DISABLE_CUSTOM_KERNELS=True` environment variable.
 Be aware that the official Docker image has them enabled by default.
@ -242,7 +242,7 @@ Be aware that the official Docker image has them enabled by default.
 ### Run
 ```shell
-make run-falcon-7b-instruct 
+make run-falcon-7b-instruct
 ```
 ### Quantization
@ -273,3 +273,9 @@ make rust-tests
 # integration tests
 make integration-tests
 ```
 ## Other supported hardware
 TGI is also supported on the following AI hardware accelerators:
 - *Habana first-gen Gaudi and Gaudi2:* checkout [here](https://github.com/huggingface/optimum-habana/tree/main/text-generation-inference) how to serve models with TGI on Gaudi and Gaudi2 with [Optimum Habana](https://huggingface.co/docs/optimum/habana/index)