From 29754ced5aee521ff107e1aa6242642bdc13e6b0 Mon Sep 17 00:00:00 2001 From: Merve Noyan Date: Thu, 24 Aug 2023 11:32:39 +0300 Subject: [PATCH] Update docs/source/basic_tutorials/non_core_models.md Co-authored-by: Omar Sanseviero --- docs/source/basic_tutorials/non_core_models.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/basic_tutorials/non_core_models.md b/docs/source/basic_tutorials/non_core_models.md index cc01a8d3..623285a5 100644 --- a/docs/source/basic_tutorials/non_core_models.md +++ b/docs/source/basic_tutorials/non_core_models.md @@ -1,6 +1,6 @@ # Non-core Model Serving -TGI supports various LLM architectures (see full list [here](./supported_models)). If you wish to serve a model that is not one of the supported models, TGI will fallback to transformers implementation of that model. This means you will be unable to use some of the features introduced by TGI, such as tensor-parallel sharding or flash attention. However, you can still get many benefits of TGI, such as continuous batching or streaming outputs. +TGI supports various LLM architectures (see full list [here](./supported_models)). If you wish to serve a model that is not one of the supported models, TGI will fallback to the `transformers` implementation of that model. This means you will be unable to use some of the features introduced by TGI, such as tensor-parallel sharding or flash attention. However, you can still get many benefits of TGI, such as continuous batching or streaming outputs. You can serve these models using Docker like below 👇