diff --git a/docs/source/basic_tutorials/visual_language_models.md b/docs/source/basic_tutorials/visual_language_models.md
index 1c32f67b..20bf5759 100644
--- a/docs/source/basic_tutorials/visual_language_models.md
+++ b/docs/source/basic_tutorials/visual_language_models.md
@@ -10,7 +10,6 @@ Below are couple of common use cases for vision language models:
 
 - **Image Captioning**: Given an image, generate a caption that describes the image.
 - **Visual Question Answering (VQA)**: Given an image and a question about the image, generate an answer to the question.
-- **Visual Dialog**: Given an image and a dialog history, generate a response to the dialog.
 - **Mulimodal Dialog**: Generate response to multiple turns of images and conversations.
 - **Image Information Retrieval**: Given an image, retrieve information from the image.