Improve the Consuming TGI docs.

This commit is contained in:
Vaibhav Srivastav 2024-08-13 15:47:25 +02:00
parent cd9b15d17f
commit 8de10acdcf
2 changed files with 21 additions and 20 deletions

View File

@ -819,13 +819,6 @@
"example": "1.0",
"nullable": true
},
"guideline": {
"type": "string",
"description": "A guideline to be used in the chat_template",
"default": "null",
"example": "null",
"nullable": true
},
"logit_bias": {
"type": "array",
"items": {
@ -1824,8 +1817,7 @@
"type": "object",
"required": [
"finish_reason",
"generated_tokens",
"input_length"
"generated_tokens"
],
"properties": {
"finish_reason": {
@ -1837,12 +1829,6 @@
"example": 1,
"minimum": 0
},
"input_length": {
"type": "integer",
"format": "int32",
"example": 1,
"minimum": 0
},
"seed": {
"type": "integer",
"format": "int64",

View File

@ -1,18 +1,33 @@
# Consuming Text Generation Inference
There are many ways you can consume Text Generation Inference server in your applications. After launching, you can use the `/generate` route and make a `POST` request to get results from the server. You can also use the `/generate_stream` route if you want TGI to return a stream of tokens. You can make the requests using the tool of your preference, such as curl, Python or TypeScrpt. For a final end-to-end experience, we also open-sourced ChatUI, a chat interface for open-source models.
There are many ways to consume Text Generation Inference (TGI) server in your applications. After launching the server, you can use the [Messages API](https://huggingface.co/docs/text-generation-inference/en/messages_api) `/v1/chat/completions` route and make a `POST` request to get results from the server. You can also pass `"stream": true` to the call if you want TGI to return a stream of tokens. You can make the requests using the tool of your preference, such as curl, Python or TypeScript. For a final end-to-end experience, we have also open-sourced ChatUI, a chat interface for open-source models.
## curl
After the launch, you can query the model using either the `/generate` or `/generate_stream` routes:
After a successful server launch, you can query the model using the `v1/chat/completions` route to get OpenAI Chat Completion API spec compliant responses:
```bash
curl 127.0.0.1:8080/generate \
curl localhost:3000/v1/chat/completions \
-X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
-d '{
"model": "tgi",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is deep learning?"
}
],
"stream": true,
"max_tokens": 20
}' \
-H 'Content-Type: application/json'
```
You can update the `stream` parameter to `false` to get a non-streaming response.
## Inference Client