From e07d0ebc06a7f50441683d2d1599e738a32e50ae Mon Sep 17 00:00:00 2001 From: drbh Date: Mon, 29 Apr 2024 20:20:30 +0000 Subject: [PATCH] fix: rename header --- docs/source/basic_tutorials/visual_language_models.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/docs/source/basic_tutorials/visual_language_models.md b/docs/source/basic_tutorials/visual_language_models.md index 20bf5759..e804ef09 100644 --- a/docs/source/basic_tutorials/visual_language_models.md +++ b/docs/source/basic_tutorials/visual_language_models.md @@ -1,4 +1,4 @@ -# Vision Language Models (VLM) +# Vision Language Model Inference in TGI Visual Language Model (VLM) are models that consume both image and text inputs to generate text. @@ -17,7 +17,7 @@ Below are couple of common use cases for vision language models: ### Hugging Face Hub Python Library -To infer with vision language models through Python, you can use the [`huggingface_hub`](https://pypi.org/project/huggingface-hub/) library. The `InferenceClient` class provides a simple way to interact with the [Inference API](https://huggingface.co/docs/api-inference/index) +To infer with vision language models through Python, you can use the [`huggingface_hub`](https://pypi.org/project/huggingface-hub/) library. The `InferenceClient` class provides a simple way to interact with the [Inference API](https://huggingface.co/docs/api-inference/index). Images can be passed as URLs or base64-encoded strings. The `InferenceClient` will automatically detect the image format. ```python from huggingface_hub import InferenceClient @@ -31,8 +31,6 @@ for token in client.text_generation(prompt, max_new_tokens=16, stream=True): # This is a picture of an anthropomorphic rabbit in a space suit. ``` -Images can be passed as URLs or base64-encoded strings. The `InferenceClient` will automatically detect the image format. - ```python from huggingface_hub import InferenceClient import base64