Improve the Consuming TGI docs.

This commit is contained in:
Vaibhav Srivastav 2024-08-13 15:47:25 +02:00
parent cd9b15d17f
commit 8de10acdcf
2 changed files with 21 additions and 20 deletions

View File

@ -819,13 +819,6 @@
"example": "1.0", "example": "1.0",
"nullable": true "nullable": true
}, },
"guideline": {
"type": "string",
"description": "A guideline to be used in the chat_template",
"default": "null",
"example": "null",
"nullable": true
},
"logit_bias": { "logit_bias": {
"type": "array", "type": "array",
"items": { "items": {
@ -1824,8 +1817,7 @@
"type": "object", "type": "object",
"required": [ "required": [
"finish_reason", "finish_reason",
"generated_tokens", "generated_tokens"
"input_length"
], ],
"properties": { "properties": {
"finish_reason": { "finish_reason": {
@ -1837,12 +1829,6 @@
"example": 1, "example": 1,
"minimum": 0 "minimum": 0
}, },
"input_length": {
"type": "integer",
"format": "int32",
"example": 1,
"minimum": 0
},
"seed": { "seed": {
"type": "integer", "type": "integer",
"format": "int64", "format": "int64",

View File

@ -1,18 +1,33 @@
# Consuming Text Generation Inference # Consuming Text Generation Inference
There are many ways you can consume Text Generation Inference server in your applications. After launching, you can use the `/generate` route and make a `POST` request to get results from the server. You can also use the `/generate_stream` route if you want TGI to return a stream of tokens. You can make the requests using the tool of your preference, such as curl, Python or TypeScrpt. For a final end-to-end experience, we also open-sourced ChatUI, a chat interface for open-source models. There are many ways to consume Text Generation Inference (TGI) server in your applications. After launching the server, you can use the [Messages API](https://huggingface.co/docs/text-generation-inference/en/messages_api) `/v1/chat/completions` route and make a `POST` request to get results from the server. You can also pass `"stream": true` to the call if you want TGI to return a stream of tokens. You can make the requests using the tool of your preference, such as curl, Python or TypeScript. For a final end-to-end experience, we have also open-sourced ChatUI, a chat interface for open-source models.
## curl ## curl
After the launch, you can query the model using either the `/generate` or `/generate_stream` routes: After a successful server launch, you can query the model using the `v1/chat/completions` route to get OpenAI Chat Completion API spec compliant responses:
```bash ```bash
curl 127.0.0.1:8080/generate \ curl localhost:3000/v1/chat/completions \
-X POST \ -X POST \
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ -d '{
"model": "tgi",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "What is deep learning?"
}
],
"stream": true,
"max_tokens": 20
}' \
-H 'Content-Type: application/json' -H 'Content-Type: application/json'
``` ```
You can update the `stream` parameter to `false` to get a non-streaming response.
## Inference Client ## Inference Client