mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-07-15 20:30:16 +00:00

Setup for doc-builder and docs for TGI (#740 )

I added ToC for docs v1 & started setting up for doc-builder. cc @Narsil
@osanseviero

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: osanseviero <osanseviero@gmail.com>
Co-authored-by: Mishig <mishig.davaadorj@coloradocollege.edu>

2023-08-10 10:24:52 +02:00

1.5 KiB

Raw Blame History

Quick Tour

The easiest way of getting started is using the official Docker container. Install Docker following their installation instructions.

Let's say you want to deploy Falcon-7B Instruct model with TGI. Here is an example on how to do that:

model=tiiuae/falcon-7b-instruct
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.0.0 --model-id $model

To use GPUs, you need to install the NVIDIA Container Toolkit . We also recommend using NVIDIA drivers with CUDA version 11.8 or higher.

Once TGI is running, you can use the generate endpoint by doing requests. To learn more about how to query the endpoints, check the Consuming TGI section.

curl 127.0.0.1:8080/generate -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' -H 'Content-Type: application/json'

To see all possible flags and options, you can use the --help flag. It's possible to configure the number of shards, quantization, generation parameters, and more.

docker run ghcr.io/huggingface/text-generation-inference:1.0.0 --help

1.5 KiB Raw Blame History

Quick Tour

1.5 KiB

Raw Blame History