From bb2b93e7a3a3b64031f6d5dc0acbf79cb23c73ba Mon Sep 17 00:00:00 2001 From: Vaibhav Srivastav Date: Wed, 14 Aug 2024 11:35:50 +0200 Subject: [PATCH] Doc review from Nico. --- docs/source/basic_tutorials/consuming_tgi.md | 70 ++++++++++---------- 1 file changed, 35 insertions(+), 35 deletions(-) diff --git a/docs/source/basic_tutorials/consuming_tgi.md b/docs/source/basic_tutorials/consuming_tgi.md index 0f87f6aa..81b3a8bf 100644 --- a/docs/source/basic_tutorials/consuming_tgi.md +++ b/docs/source/basic_tutorials/consuming_tgi.md @@ -11,7 +11,7 @@ You can make the requests using any tool of your preference, such as curl, Pytho After a successful server launch, you can query the model using the `v1/chat/completions` route, to get responses that are compliant to the OpenAI Chat Completion spec: ```bash -curl -N localhost:3000/v1/chat/completions \ +curl localhost:8080/v1/chat/completions \ -X POST \ -d '{ "model": "tgi", @@ -33,39 +33,6 @@ curl -N localhost:3000/v1/chat/completions \ ## Python -### OpenAI Client - -You can directly use the OpenAI [Python](https://github.com/openai/openai-python) or [JS](https://github.com/openai/openai-node) clients to interact with TGI. - -Install the OpenAI Python package via pip. - -```bash -pip install openai -``` - -```python -from openai import OpenAI - -# init the client but point it to TGI -client = OpenAI( - base_url="http://localhost:3000/v1/", - api_key="-" -) - -chat_completion = client.chat.completions.create( - model="tgi", - messages=[ - {"role": "system", "content": "You are a helpful assistant." }, - {"role": "user", "content": "What is deep learning?"} - ], - stream=True -) - -# iterate and print stream -for message in chat_completion: - print(message) -``` - ### Inference Client [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/main/en/index) is a Python library to interact with the Hugging Face Hub, including its endpoints. It provides a high-level class, [`huggingface_hub.InferenceClient`](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient), which makes it easy to make calls to TGI's Messages API. `InferenceClient` also takes care of parameter validation and provides a simple-to-use interface. @@ -84,7 +51,7 @@ You can now use `InferenceClient` the exact same way you would use `OpenAI` clie - client = OpenAI( + client = InferenceClient( - base_url="http://localhost:3000/v1/", + base_url="http://localhost:8080/v1/", ) output = client.chat.completions.create( @@ -105,6 +72,39 @@ You can check out more details about OpenAI compatibility [here](https://hugging There is also an async version of the client, `AsyncInferenceClient`, based on `asyncio` and `aiohttp`. You can find docs for it [here](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.AsyncInferenceClient) +### OpenAI Client + +You can directly use the OpenAI [Python](https://github.com/openai/openai-python) or [JS](https://github.com/openai/openai-node) clients to interact with TGI. + +Install the OpenAI Python package via pip. + +```bash +pip install openai +``` + +```python +from openai import OpenAI + +# init the client but point it to TGI +client = OpenAI( + base_url="http://localhost:8080/v1/", + api_key="-" +) + +chat_completion = client.chat.completions.create( + model="tgi", + messages=[ + {"role": "system", "content": "You are a helpful assistant." }, + {"role": "user", "content": "What is deep learning?"} + ], + stream=True +) + +# iterate and print stream +for message in chat_completion: + print(message) +``` + ## UI ### Gradio