From 4f1657418dfe0d523db09ac7047da7622ab95d43 Mon Sep 17 00:00:00 2001 From: Merve Noyan Date: Wed, 2 Aug 2023 23:45:09 +0300 Subject: [PATCH] Update docs/source/supported_models.md Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/supported_models.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/supported_models.md b/docs/source/supported_models.md index a1cd309f..3bef7abb 100644 --- a/docs/source/supported_models.md +++ b/docs/source/supported_models.md @@ -33,7 +33,7 @@ For the optimized models above, TGI uses custom CUDA kernels for better inferenc TGI optimized models are supported on NVIDIA [A100](https://www.nvidia.com/en-us/data-center/a100/), [A10G](https://www.nvidia.com/en-us/data-center/products/a10-gpu/) and [T4](https://www.nvidia.com/en-us/data-center/tesla-t4/) GPUs with CUDA 11.8+. Note that you have to install [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) to use it. For other hardware, continuous batching will still apply, but some operations like flash attention and paged attention will not be executed. TGI is also supported on the following AI hardware accelerators: -- *Habana first-gen Gaudi and Gaudi2:* checkout [here](https://github.com/huggingface/optimum-habana/tree/main/text-generation-inference) how to serve models with TGI on Gaudi and Gaudi2 with [Optimum Habana](https://huggingface.co/docs/optimum/habana/index) +- *Habana first-gen Gaudi and Gaudi2:* check out this [example](https://github.com/huggingface/optimum-habana/tree/main/text-generation-inference) how to serve models with TGI on Gaudi and Gaudi2 with [Optimum Habana](https://huggingface.co/docs/optimum/habana/index)