diff --git a/docs/source/basic_tutorials/non_core_models.md b/docs/source/basic_tutorials/non_core_models.md index f6a8dc8e..9ae5e443 100644 --- a/docs/source/basic_tutorials/non_core_models.md +++ b/docs/source/basic_tutorials/non_core_models.md @@ -12,7 +12,7 @@ AutoModelForCausalLM.from_pretrained(, device_map="auto")`` AutoModelForSeq2SeqLM.from_pretrained(, device_map="auto") ``` -This means, you will be unable to use some of the features introduced by TGI, such as tensor-parallel sharding or flash attention. However, you can still get many benefits of TGI, such as continuous batching, or streaming outputs. +This means you will be unable to use some of the features introduced by TGI, such as tensor-parallel sharding or flash attention. However, you can still get many benefits of TGI, such as continuous batching or streaming outputs. You can serve these models using docker like below 👇