mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-21 06:42:10 +00:00
I added ToC for docs v1 & started setting up for doc-builder. cc @Narsil @osanseviero --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: osanseviero <osanseviero@gmail.com> Co-authored-by: Mishig <mishig.davaadorj@coloradocollege.edu>
1.5 KiB
1.5 KiB
Quick Tour
The easiest way of getting started is using the official Docker container. Install Docker following their installation instructions.
Let's say you want to deploy Falcon-7B Instruct model with TGI. Here is an example on how to do that:
model=tiiuae/falcon-7b-instruct
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.0.0 --model-id $model
To use GPUs, you need to install the NVIDIA Container Toolkit . We also recommend using NVIDIA drivers with CUDA version 11.8 or higher.
Once TGI is running, you can use the generate
endpoint by doing requests. To learn more about how to query the endpoints, check the Consuming TGI section.
curl 127.0.0.1:8080/generate -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' -H 'Content-Type: application/json'
To see all possible flags and options, you can use the --help
flag. It's possible to configure the number of shards, quantization, generation parameters, and more.
docker run ghcr.io/huggingface/text-generation-inference:1.0.0 --help