From a27b31c34ab9fa61760dce082330f9a9c13001bc Mon Sep 17 00:00:00 2001
From: Vaibhav Srivastav <vaibhavs10@gmail.com>
Date: Wed, 14 Aug 2024 11:18:45 +0200
Subject: [PATCH] Up.

---
 docs/source/basic_tutorials/consuming_tgi.md | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/docs/source/basic_tutorials/consuming_tgi.md b/docs/source/basic_tutorials/consuming_tgi.md
index 6e562226..60df0b6a 100644
--- a/docs/source/basic_tutorials/consuming_tgi.md
+++ b/docs/source/basic_tutorials/consuming_tgi.md
@@ -1,6 +1,6 @@
 # Consuming Text Generation Inference
 
-There are many ways to consume Text Generation Inference (TGI) server in your applications. After launching the server, you can use the [Messages API](https://huggingface.co/docs/text-generation-inference/en/messages_api) `/v1/chat/completions` route and make a `POST` request to get results from the server. You can also pass `"stream": true` to the call if you want TGI to return a stream of tokens. While `/generate` and `/generate_stream` are still available, the Messages API is recommended as it automatically applies the chat template.
+There are many ways to consume Text Generation Inference (TGI) server in your applications. After launching the server, you can use the [Messages API](https://huggingface.co/docs/text-generation-inference/en/messages_api) `/v1/chat/completions` route and make a `POST` request to get results from the server. You can also pass `"stream": true` to the call if you want TGI to return a stream of tokens.
 
 For more information on the API, consult the OpenAPI documentation of `text-generation-inference` available [here](https://huggingface.github.io/text-generation-inference).
 
@@ -31,8 +31,6 @@ curl -N localhost:3000/v1/chat/completions \
     -H 'Content-Type: application/json'
 ```
 
-You can set the `stream` parameter to `false` to get a non-streaming response.
-
 ## Python
 
 ### OpenAI Client
@@ -86,8 +84,7 @@ You can now use `InferenceClient` the exact same way you would use `OpenAI` clie
 
 - client = OpenAI(
 + client = InferenceClient(
-    base_url=...,
-    api_key=...,
+    base_url="http://localhost:3000/v1/",
 )