diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index f52fa2ec..7aaf4bb2 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -16,7 +16,7 @@ - local: installation title: Installation from source - local: supported_models - title: Supported Models and Hardware + title: Supported Models - local: architecture title: Internal Architecture - local: usage_statistics diff --git a/docs/source/supported_models.md b/docs/source/supported_models.md index 832f88ef..34849b22 100644 --- a/docs/source/supported_models.md +++ b/docs/source/supported_models.md @@ -1,9 +1,7 @@ -# Supported Models and Hardware +# Supported Models -Text Generation Inference enables serving optimized models on specific hardware for the highest performance. The following sections list which models (VLMs & LLMs) are supported. - -## Supported Models +Text Generation Inference enables serving optimized models. The following sections list which models (VLMs & LLMs) are supported. - [Deepseek V2](https://huggingface.co/deepseek-ai/DeepSeek-V2) - [Idefics 2](https://huggingface.co/HuggingFaceM4/idefics2-8b) (Multimodal) @@ -36,17 +34,4 @@ Text Generation Inference enables serving optimized models on specific hardware - [Idefics](https://huggingface.co/HuggingFaceM4/idefics-9b) (Multimodal) -If the above list lacks the model you would like to serve, depending on the model's pipeline type, you can try to initialize and serve the model anyways to see how well it performs, but performance isn't guaranteed for non-optimized models: - -```python -# for causal LMs/text-generation models -AutoModelForCausalLM.from_pretrained(, device_map="auto")` -# or, for text-to-text generation models -AutoModelForSeq2SeqLM.from_pretrained(, device_map="auto") -``` - -If you wish to serve a supported model that already exists on a local folder, just point to the local folder. - -```bash -text-generation-launcher --model-id -``` +If the above list lacks the model you would like to serve, depending on the model's pipeline type, you can try to initialize and serve the model anyways to see how well it performs, but performance isn't guaranteed for non-optimized models. Read more about [Non-core Model Serving](../basic_tutorials/non_core_models). \ No newline at end of file diff --git a/update_doc.py b/update_doc.py index 3fb0d314..e37746f9 100644 --- a/update_doc.py +++ b/update_doc.py @@ -5,13 +5,10 @@ import json import os TEMPLATE = """ -# Supported Models and Hardware +# Supported Models -Text Generation Inference enables serving optimized models on specific hardware for the highest performance. The following sections list which models (VLMs & LLMs) are supported. +Text Generation Inference enables serving optimized models. The following sections list which models (VLMs & LLMs) are supported. -## Supported Models - -SUPPORTED_MODELS If the above list lacks the model you would like to serve, depending on the model's pipeline type, you can try to initialize and serve the model anyways to see how well it performs, but performance isn't guaranteed for non-optimized models: