Restructure

2025-09-10 20:04:52 +00:00 · 2023-08-22 23:45:56 +03:00 · 2023-08-22 23:45:56 +03:00 · cf0453182e
commit cf0453182e
parent bdf36659a6
1 changed files with 3 additions and 3 deletions
--- a/docs/source/basic_tutorials/non_core_models.md
+++ b/docs/source/basic_tutorials/non_core_models.md
@ -1,6 +1,8 @@
 # Non-core Model Serving

-TGI supports various LLM architectures (see full list [here](./supported_models)). If you wish to serve a model that is not one of the supported models, TGI will fallback to transformers implementation of that model. They can be loaded by:
+TGI supports various LLM architectures (see full list [here](./supported_models)). If you wish to serve a model that is not one of the supported models, TGI will fallback to transformers implementation of that model. This means you will be unable to use some of the features introduced by TGI, such as tensor-parallel sharding or flash attention. However, you can still get many benefits of TGI, such as continuous batching or streaming outputs.
+
+They can be loaded by:

 ```python
 from transformers import AutoModelForCausalLM, AutoModelForSeq2SeqLM
@ -12,8 +14,6 @@ AutoModelForCausalLM.from_pretrained(<model>, device_map="auto")``
 AutoModelForSeq2SeqLM.from_pretrained(<model>, device_map="auto")
 ```

-This means you will be unable to use some of the features introduced by TGI, such as tensor-parallel sharding or flash attention. However, you can still get many benefits of TGI, such as continuous batching or streaming outputs.
-
 You can serve these models using Docker like below 👇 

 ```bash