Update consuming_tgi.md

2025-09-10 20:04:52 +00:00 · 2023-08-17 16:31:11 +03:00 · 2023-08-17 16:31:11 +03:00 · 452f8f3c2b
commit 452f8f3c2b
parent bce5e22444
1 changed files with 38 additions and 0 deletions
--- a/docs/source/basic_tutorials/consuming_tgi.md
+++ b/docs/source/basic_tutorials/consuming_tgi.md
@ -75,6 +75,44 @@ To serve both ChatUI and TGI in same environment, simply add your own endpoints
 ![ChatUI](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/chatui_screen.png)
 ## Gradio
 Gradio has a `ChatInterface` class to create neat UIs for chatbots. Let's take a look at how to create a chatbot with streaming mode using TGI and Gradio. Assume you are serving your model on port 8080.
 ```python
 import gradio as gr
 from huggingface_hub import InferenceClient
 # initialize InferenceClient
 client = InferenceClient(model="http://127.0.0.1:8080")
 # query client using streaming mode
 def inference(message, history):
    partial_message = ""
    for token in client.text_generation(message, max_new_tokens=20, stream=True):
        partial_message += token
        yield partial_message
 gr.ChatInterface(
    inference,
    chatbot=gr.Chatbot(height=300),
    textbox=gr.Textbox(placeholder="Chat with me!", container=False, scale=7),
    description="This is the demo for Gradio UI consuming TGI endpoint with Falcon model.",
    title="Gradio 🤝 TGI",
    examples=["Are tomatoes vegetables?"],
    retry_btn=None,
    undo_btn="Undo",
    clear_btn="Clear",
 ).queue().launch()
 ```
 The UI looks like this 👇 
 ![Gradio TGI](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/gradio-tgi.png)
 You can disable streaming mode using `return` instead of `yield` in your inference function.
 You can read more about how to customize a `ChatInterface` [here](https://www.gradio.app/guides/creating-a-chatbot-fast).
 ## API documentation
 You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route. The Swagger UI is also available [here](https://huggingface.github.io/text-generation-inference).