mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-09-12 04:44:52 +00:00
Apply suggestions from code review
Co-authored-by: Lucain <lucainp@gmail.com>
This commit is contained in:
parent
cd18ee3ac9
commit
3ba36590c6
@ -2,7 +2,7 @@
|
|||||||
|
|
||||||
There are many ways to consume Text Generation Inference (TGI) server in your applications. After launching the server, you can use the [Messages API](https://huggingface.co/docs/text-generation-inference/en/messages_api) `/v1/chat/completions` route and make a `POST` request to get results from the server. You can also pass `"stream": true` to the call if you want TGI to return a stream of tokens. While `/generate` and `/generate_stream` are still available, the Messages API is recommended as it automatically applies the chat template.
|
There are many ways to consume Text Generation Inference (TGI) server in your applications. After launching the server, you can use the [Messages API](https://huggingface.co/docs/text-generation-inference/en/messages_api) `/v1/chat/completions` route and make a `POST` request to get results from the server. You can also pass `"stream": true` to the call if you want TGI to return a stream of tokens. While `/generate` and `/generate_stream` are still available, the Messages API is recommended as it automatically applies the chat template.
|
||||||
|
|
||||||
For more information on the API, consult the OpenAPI documentation of the `text-generation-inference` available [here](https://huggingface.github.io/text-generation-inference).
|
For more information on the API, consult the OpenAPI documentation of `text-generation-inference` available [here](https://huggingface.github.io/text-generation-inference).
|
||||||
|
|
||||||
You can make the requests using any tool of your preference, such as curl, Python or TypeScript. For an end-to-end experience, we've open-sourced ChatUI, a chat interface for open-source models.
|
You can make the requests using any tool of your preference, such as curl, Python or TypeScript. For an end-to-end experience, we've open-sourced ChatUI, a chat interface for open-source models.
|
||||||
|
|
||||||
@ -31,13 +31,13 @@ curl -N localhost:3000/v1/chat/completions \
|
|||||||
-H 'Content-Type: application/json'
|
-H 'Content-Type: application/json'
|
||||||
```
|
```
|
||||||
|
|
||||||
You can update the `stream` parameter to `false` to get a non-streaming response.
|
You can set the `stream` parameter to `false` to get a non-streaming response.
|
||||||
|
|
||||||
## Python
|
## Python
|
||||||
|
|
||||||
### OpenAI Client
|
### OpenAI Client
|
||||||
|
|
||||||
You can directly use the OpenAI [Python](https://github.com/openai/openai-python)/ [JS](https://github.com/openai/openai-node) client to interact with TGI.
|
You can directly use the OpenAI [Python](https://github.com/openai/openai-python) or [JS](https://github.com/openai/openai-node) clients to interact with TGI.
|
||||||
|
|
||||||
Install the OpenAI Python package via pip.
|
Install the OpenAI Python package via pip.
|
||||||
|
|
||||||
@ -70,15 +70,15 @@ for message in chat_completion:
|
|||||||
|
|
||||||
### Inference Client
|
### Inference Client
|
||||||
|
|
||||||
[`huggingface-hub`](https://huggingface.co/docs/huggingface_hub/main/en/index) is a Python library to interact with the Hugging Face Hub, including its endpoints. It provides a high-level class, [`~huggingface_hub.InferenceClient`], which makes it easy to make calls to TGI's Messages API. `InferenceClient` also takes care of parameter validation and provides a simple to-use interface.
|
[`huggingface-hub`](https://huggingface.co/docs/huggingface_hub/main/en/index) is a Python library to interact with the Hugging Face Hub, including its endpoints. It provides a high-level class, [`huggingface_hub.InferenceClient`](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient), which makes it easy to make calls to TGI's Messages API. `InferenceClient` also takes care of parameter validation and provides a simple to-use interface.
|
||||||
|
|
||||||
Install `huggingface-hub` package via pip.
|
Install `huggingface_hub` package via pip.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install huggingface-hub
|
pip install huggingface_hub
|
||||||
```
|
```
|
||||||
|
|
||||||
Once you start the TGI server, instantiate `InferenceClient()` with the URL to the endpoint serving the model. You can then call `text_generation()` to hit the endpoint through Python.
|
You can now use `InferenceClient` the exact same way you would use `OpenAI` client in Python
|
||||||
|
|
||||||
```python
|
```python
|
||||||
- from openai import OpenAI
|
- from openai import OpenAI
|
||||||
@ -105,7 +105,9 @@ for chunk in output:
|
|||||||
print(chunk.choices[0].delta.content)
|
print(chunk.choices[0].delta.content)
|
||||||
```
|
```
|
||||||
|
|
||||||
You can check out more details [here](https://huggingface.co/docs/huggingface_hub/en/guides/inference#openai-compatibility). There is also an async version of the client, `AsyncInferenceClient`, based on `asyncio` and `aiohttp`. You can find docs for it [here](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.AsyncInferenceClient)
|
You can check out more details about OpenAI compatibility [here](https://huggingface.co/docs/huggingface_hub/en/guides/inference#openai-compatibility).
|
||||||
|
|
||||||
|
There is also an async version of the client, `AsyncInferenceClient`, based on `asyncio` and `aiohttp`. You can find docs for it [here](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.AsyncInferenceClient)
|
||||||
|
|
||||||
## UI
|
## UI
|
||||||
|
|
||||||
|
@ -50,7 +50,6 @@ from huggingface_hub import InferenceClient
|
|||||||
|
|
||||||
client = InferenceClient("http://127.0.0.1:8080")
|
client = InferenceClient("http://127.0.0.1:8080")
|
||||||
output = client.chat.completions.create(
|
output = client.chat.completions.create(
|
||||||
model="tgi",
|
|
||||||
messages=[
|
messages=[
|
||||||
{"role": "system", "content": "You are a helpful assistant."},
|
{"role": "system", "content": "You are a helpful assistant."},
|
||||||
{"role": "user", "content": "Count to 10"},
|
{"role": "user", "content": "Count to 10"},
|
||||||
@ -82,7 +81,6 @@ from huggingface_hub import AsyncInferenceClient
|
|||||||
client = AsyncInferenceClient("http://127.0.0.1:8080")
|
client = AsyncInferenceClient("http://127.0.0.1:8080")
|
||||||
async def main():
|
async def main():
|
||||||
stream = await client.chat.completions.create(
|
stream = await client.chat.completions.create(
|
||||||
model="tgi",
|
|
||||||
messages=[{"role": "user", "content": "Say this is a test"}],
|
messages=[{"role": "user", "content": "Say this is a test"}],
|
||||||
stream=True,
|
stream=True,
|
||||||
)
|
)
|
||||||
|
Loading…
Reference in New Issue
Block a user