mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-09-10 20:04:52 +00:00
sync changes and refactor
This commit is contained in:
parent
29129dc660
commit
15ef2bc082
@ -1,64 +0,0 @@
|
|||||||
# Installing from the Source and Launching TGI
|
|
||||||
|
|
||||||
Before you start, you will need to setup your environment, and install Text Generation Inference. Text Generation Inference is tested on **Python 3.9+**.
|
|
||||||
|
|
||||||
## Local Installation from Source
|
|
||||||
|
|
||||||
Text Generation Inference is available on pypi, conda and GitHub.
|
|
||||||
|
|
||||||
To install and launch locally, first [install Rust](https://rustup.rs/) and create a Python virtual environment with at least
|
|
||||||
Python 3.9, e.g. using conda:
|
|
||||||
|
|
||||||
```shell
|
|
||||||
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
|
|
||||||
|
|
||||||
conda create -n text-generation-inference python=3.9
|
|
||||||
conda activate text-generation-inference
|
|
||||||
```
|
|
||||||
|
|
||||||
You may also need to install Protoc.
|
|
||||||
|
|
||||||
On Linux:
|
|
||||||
|
|
||||||
```shell
|
|
||||||
PROTOC_ZIP=protoc-21.12-linux-x86_64.zip
|
|
||||||
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v21.12/$PROTOC_ZIP
|
|
||||||
sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc
|
|
||||||
sudo unzip -o $PROTOC_ZIP -d /usr/local 'include/*'
|
|
||||||
rm -f $PROTOC_ZIP
|
|
||||||
```
|
|
||||||
|
|
||||||
On MacOS, using Homebrew:
|
|
||||||
|
|
||||||
```shell
|
|
||||||
brew install protobuf
|
|
||||||
```
|
|
||||||
|
|
||||||
Then run to install Text Generation Inference:
|
|
||||||
|
|
||||||
```shell
|
|
||||||
BUILD_EXTENSIONS=True make install # Install repository and HF/transformer fork with CUDA kernels
|
|
||||||
```
|
|
||||||
|
|
||||||
<Tip warning={true}>
|
|
||||||
|
|
||||||
On some machines, you may also need the OpenSSL libraries and gcc. On Linux machines, run:
|
|
||||||
|
|
||||||
```shell
|
|
||||||
sudo apt-get install libssl-dev gcc -y
|
|
||||||
```
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
Once installation is done, simply run:
|
|
||||||
|
|
||||||
```shell
|
|
||||||
make run-falcon-7b-instruct
|
|
||||||
```
|
|
||||||
|
|
||||||
This will serve Falcon 7B Instruct model from the port 8080, which we can query.
|
|
||||||
|
|
||||||
To see all options to serve your models, check in the [codebase](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs) or the CLI:
|
|
||||||
```
|
|
||||||
text-generation-launcher --help
|
|
||||||
```
|
|
@ -4,8 +4,56 @@ This section explains how to install the CLI tool as well as installing TGI from
|
|||||||
|
|
||||||
## Install CLI
|
## Install CLI
|
||||||
|
|
||||||
TODO
|
You can use CLI tools of TGI to download weights, serve and quantize models, or get information on serving parameters.
|
||||||
|
|
||||||
|
To install TGI to use with CLI, you need to first clone the TGI repository, then inside the repository, run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference
|
||||||
|
make install
|
||||||
|
```
|
||||||
|
|
||||||
|
If you would like to serve models with custom kernels, run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
BUILD_EXTENSIONS=True make install
|
||||||
|
```
|
||||||
|
|
||||||
|
## Running CLI
|
||||||
|
|
||||||
|
After installation, you will be able to use `text-generation-server` and `text-generation-launcher`.
|
||||||
|
|
||||||
|
`text-generation-server` lets you download the model with `download-weights` command like below 👇
|
||||||
|
|
||||||
|
```shell
|
||||||
|
text-generation-server download-weights MODEL_HUB_ID
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also use it to quantize models like below 👇
|
||||||
|
|
||||||
|
```shell
|
||||||
|
text-generation-server quantize MODEL_HUB_ID OUTPUT_DIR
|
||||||
|
```
|
||||||
|
|
||||||
|
You can use `text-generation-launcher` to serve models.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
text-generation-launcher --model-id MODEL_HUB_ID --port 8080
|
||||||
|
```
|
||||||
|
|
||||||
|
There are many options and parameters you can pass to `text-generation-launcher`. The documentation for CLI is kept minimal and intended to rely on self-generating documentation, which can be found by running
|
||||||
|
|
||||||
|
```shell
|
||||||
|
text-generation-launcher --help
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also find it hosted in this [Swagger UI](https://huggingface.github.io/text-generation-inference/).
|
||||||
|
|
||||||
|
Same documentation can be found for `text-generation-server`.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
text-generation-server --help
|
||||||
|
```
|
||||||
|
|
||||||
## Local Installation from Source
|
## Local Installation from Source
|
||||||
|
|
||||||
@ -44,7 +92,8 @@ brew install protobuf
|
|||||||
Then run to install Text Generation Inference:
|
Then run to install Text Generation Inference:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
BUILD_EXTENSIONS=True make install # Install repository and HF/transformer fork with CUDA kernels
|
git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference
|
||||||
|
BUILD_EXTENSIONS=True make install
|
||||||
```
|
```
|
||||||
|
|
||||||
<Tip warning={true}>
|
<Tip warning={true}>
|
||||||
@ -64,8 +113,3 @@ make run-falcon-7b-instruct
|
|||||||
```
|
```
|
||||||
|
|
||||||
This will serve Falcon 7B Instruct model from the port 8080, which we can query.
|
This will serve Falcon 7B Instruct model from the port 8080, which we can query.
|
||||||
|
|
||||||
To see all options to serve your models, check in the [codebase](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs) or the CLI:
|
|
||||||
```
|
|
||||||
text-generation-launcher --help
|
|
||||||
```
|
|
||||||
|
@ -1,33 +0,0 @@
|
|||||||
# Getting Started
|
|
||||||
|
|
||||||
The easiest way of getting started is using the official Docker container.
|
|
||||||
|
|
||||||
## Launching with Docker
|
|
||||||
|
|
||||||
Install Docker following [their installation instructions](https://docs.docker.com/get-docker/).
|
|
||||||
|
|
||||||
Let's say you want to deploy [Falcon-7B Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) model with TGI. Here is an example on how to do that:
|
|
||||||
|
|
||||||
```shell
|
|
||||||
model=tiiuae/falcon-7b-instruct
|
|
||||||
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
|
|
||||||
|
|
||||||
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.0.0 --model-id $model
|
|
||||||
```
|
|
||||||
|
|
||||||
<Tip warning={true}>
|
|
||||||
|
|
||||||
To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) . We also recommend using NVIDIA drivers with CUDA version 11.8 or higher.
|
|
||||||
|
|
||||||
</Tip>
|
|
||||||
|
|
||||||
To see all options to serve your models, check in the [codebase](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs) or the CLI:
|
|
||||||
|
|
||||||
```shell
|
|
||||||
text-generation-launcher --help
|
|
||||||
```
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
@ -31,4 +31,4 @@ To see all possible flags and options, you can use the `--help` flag. It's possi
|
|||||||
docker run ghcr.io/huggingface/text-generation-inference:1.0.0 --help
|
docker run ghcr.io/huggingface/text-generation-inference:1.0.0 --help
|
||||||
```
|
```
|
||||||
|
|
||||||
</Tip>
|
</Tip>
|
||||||
|
Loading…
Reference in New Issue
Block a user