Consuming Text Generation Inference

ChatUI

ChatUI is the open-source interface built for large language model serving. It offers many customization options, web search with SERP API and more. ChatUI can automatically consume the Text Generation Inference server, and even provide option to switch between different TGI endpoints. You can try it out at Hugging Chat, or use ChatUI Docker Spaces to deploy your own Hugging Chat to Spaces.

To serve both ChatUI and TGI in same environment, simply add your own endpoints to the MODELS variable in ``.env.localfile insidechat-ui` repository. Provide the endpoints pointing to where TGI is served.

{
// rest of the model config here
"endpoints": [{"url": "https://HOST:PORT/generate_stream"}]
}

Inference Client

huggingface-hub is a Python library to interact and manage repositories and endpoints on Hugging Face Hub. InferenceClient is a class that lets users interact with models on Hugging Face Hub and Hugging Face models served by any TGI endpoint. Once you start the TGI server, simply instantiate InferenceClient() with the URL to endpoint serving the model. You can then call text_generation() to hit the endpoint through Python.

from huggingface_hub import InferenceClient
client = InferenceClient(model=URL_TO_ENDPOINT_SERVING_TGI)
client.text_generation(prompt="Write a code for snake game", model=URL_TO_ENDPOINT_SERVING_TGI)

You can check out the details of the function here.

1.7 KiB Raw Permalink Blame History

Consuming Text Generation Inference

ChatUI

Inference Client

1.7 KiB

Raw Permalink Blame History