mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-23 07:52:06 +00:00
52 lines
1.6 KiB
Markdown
52 lines
1.6 KiB
Markdown
|
# Launching with Docker
|
||
|
|
||
|
The easiest way of getting started is using the official Docker container:
|
||
|
|
||
|
```shell
|
||
|
model=tiiuae/falcon-7b-instruct
|
||
|
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
|
||
|
|
||
|
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.0.0 --model-id $model
|
||
|
```
|
||
|
**Note:** To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). We also recommend using NVIDIA drivers with CUDA version 11.8 or higher.
|
||
|
|
||
|
|
||
|
You can then query the model using either the `/generate` or `/generate_stream` routes:
|
||
|
|
||
|
```shell
|
||
|
curl 127.0.0.1:8080/generate \
|
||
|
-X POST \
|
||
|
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
|
||
|
-H 'Content-Type: application/json'
|
||
|
```
|
||
|
|
||
|
```shell
|
||
|
curl 127.0.0.1:8080/generate_stream \
|
||
|
-X POST \
|
||
|
-d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
|
||
|
-H 'Content-Type: application/json'
|
||
|
```
|
||
|
|
||
|
or from Python:
|
||
|
|
||
|
```shell
|
||
|
pip install text-generation
|
||
|
```
|
||
|
|
||
|
```python
|
||
|
from text_generation import Client
|
||
|
|
||
|
client = Client("http://127.0.0.1:8080")
|
||
|
print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)
|
||
|
|
||
|
text = ""
|
||
|
for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
|
||
|
if not response.token.special:
|
||
|
text += response.token.text
|
||
|
print(text)
|
||
|
```
|
||
|
|
||
|
To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:
|
||
|
```
|
||
|
text-generation-launcher --help
|
||
|
```
|