Up.

2025-09-12 04:44:52 +00:00 · 2024-08-14 11:18:45 +02:00 · 2024-08-14 11:18:45 +02:00 · a27b31c34a
commit a27b31c34a
parent 7007394766
1 changed files with 2 additions and 5 deletions
--- a/docs/source/basic_tutorials/consuming_tgi.md
+++ b/docs/source/basic_tutorials/consuming_tgi.md
@ -1,6 +1,6 @@
 # Consuming Text Generation Inference

-There are many ways to consume Text Generation Inference (TGI) server in your applications. After launching the server, you can use the [Messages API](https://huggingface.co/docs/text-generation-inference/en/messages_api) `/v1/chat/completions` route and make a `POST` request to get results from the server. You can also pass `"stream": true` to the call if you want TGI to return a stream of tokens. While `/generate` and `/generate_stream` are still available, the Messages API is recommended as it automatically applies the chat template.
+There are many ways to consume Text Generation Inference (TGI) server in your applications. After launching the server, you can use the [Messages API](https://huggingface.co/docs/text-generation-inference/en/messages_api) `/v1/chat/completions` route and make a `POST` request to get results from the server. You can also pass `"stream": true` to the call if you want TGI to return a stream of tokens.

 For more information on the API, consult the OpenAPI documentation of `text-generation-inference` available [here](https://huggingface.github.io/text-generation-inference).

@ -31,8 +31,6 @@ curl -N localhost:3000/v1/chat/completions \
    -H 'Content-Type: application/json'
 ```

-You can set the `stream` parameter to `false` to get a non-streaming response.
-
 ## Python

 ### OpenAI Client
@ -86,8 +84,7 @@ You can now use `InferenceClient` the exact same way you would use `OpenAI` clie

 - client = OpenAI(
 + client = InferenceClient(
-    base_url=...,
-    api_key=...,
+    base_url="http://localhost:3000/v1/",
 )