From db2dd5229b60cf8c3bf8a04fe29aa5f91646e64e Mon Sep 17 00:00:00 2001
From: Merve Noyan <merveenoyan@gmail.com>
Date: Fri, 11 Aug 2023 16:21:11 +0300
Subject: [PATCH] Added streaming

---
 docs/source/basic_tutorials/consuming_tgi.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/source/basic_tutorials/consuming_tgi.md b/docs/source/basic_tutorials/consuming_tgi.md
index 7fb74719..f608c90f 100644
--- a/docs/source/basic_tutorials/consuming_tgi.md
+++ b/docs/source/basic_tutorials/consuming_tgi.md
@@ -33,6 +33,8 @@ client = InferenceClient(model=URL_TO_ENDPOINT_SERVING_TGI)
 client.text_generation(prompt="Write a code for snake game", model=URL_TO_ENDPOINT_SERVING_TGI)
 ```
 
+To stream tokens in `InferenceClient`, simply pass `stream=True`. Another parameter you can use with TGI backend is `details`. You can get more details on generation (tokens, probabilities, etc.) by `details` to `True`. By default, `details` is set to `False`, and `text_generation` only returns text output. If you set `details` and `stream` both as `True`, `text_generation` will return `TextGenerationStreamResponse` which consists of the generated token, generated text, and details.
+
 You can check out the details of the function [here](https://huggingface.co/docs/huggingface_hub/main/en/package_reference/inference_client#huggingface_hub.InferenceClient.text_generation).