diff --git a/docs/source/basic_tutorials/visual_language_models.md b/docs/source/basic_tutorials/visual_language_models.md index 1c32f67b..20bf5759 100644 --- a/docs/source/basic_tutorials/visual_language_models.md +++ b/docs/source/basic_tutorials/visual_language_models.md @@ -10,7 +10,6 @@ Below are couple of common use cases for vision language models: - **Image Captioning**: Given an image, generate a caption that describes the image. - **Visual Question Answering (VQA)**: Given an image and a question about the image, generate an answer to the question. -- **Visual Dialog**: Given an image and a dialog history, generate a response to the dialog. - **Mulimodal Dialog**: Generate response to multiple turns of images and conversations. - **Image Information Retrieval**: Given an image, retrieve information from the image.