diff --git a/docs/source/basic_tutorials/local_launch.md b/docs/source/basic_tutorials/local_launch.md deleted file mode 100644 index ed9b5442..00000000 --- a/docs/source/basic_tutorials/local_launch.md +++ /dev/null @@ -1,64 +0,0 @@ -# Installing from the Source and Launching TGI - -Before you start, you will need to setup your environment, and install Text Generation Inference. Text Generation Inference is tested on **Python 3.9+**. - -## Local Installation from Source - -Text Generation Inference is available on pypi, conda and GitHub. - -To install and launch locally, first [install Rust](https://rustup.rs/) and create a Python virtual environment with at least -Python 3.9, e.g. using conda: - -```shell -curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh - -conda create -n text-generation-inference python=3.9 -conda activate text-generation-inference -``` - -You may also need to install Protoc. - -On Linux: - -```shell -PROTOC_ZIP=protoc-21.12-linux-x86_64.zip -curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v21.12/$PROTOC_ZIP -sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc -sudo unzip -o $PROTOC_ZIP -d /usr/local 'include/*' -rm -f $PROTOC_ZIP -``` - -On MacOS, using Homebrew: - -```shell -brew install protobuf -``` - -Then run to install Text Generation Inference: - -```shell -BUILD_EXTENSIONS=True make install # Install repository and HF/transformer fork with CUDA kernels -``` - - - -On some machines, you may also need the OpenSSL libraries and gcc. On Linux machines, run: - -```shell -sudo apt-get install libssl-dev gcc -y -``` - - - -Once installation is done, simply run: - -```shell -make run-falcon-7b-instruct -``` - -This will serve Falcon 7B Instruct model from the port 8080, which we can query. - -To see all options to serve your models, check in the [codebase](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs) or the CLI: -``` -text-generation-launcher --help -``` diff --git a/docs/source/installation.md b/docs/source/installation.md index 4105acf4..aec59510 100644 --- a/docs/source/installation.md +++ b/docs/source/installation.md @@ -4,8 +4,56 @@ This section explains how to install the CLI tool as well as installing TGI from ## Install CLI -TODO +You can use CLI tools of TGI to download weights, serve and quantize models, or get information on serving parameters. +To install TGI to use with CLI, you need to first clone the TGI repository, then inside the repository, run + +```shell +git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference +make install +``` + +If you would like to serve models with custom kernels, run + +```shell +BUILD_EXTENSIONS=True make install +``` + +## Running CLI + +After installation, you will be able to use `text-generation-server` and `text-generation-launcher`. + +`text-generation-server` lets you download the model with `download-weights` command like below 👇 + +```shell +text-generation-server download-weights MODEL_HUB_ID +``` + +You can also use it to quantize models like below 👇 + +```shell +text-generation-server quantize MODEL_HUB_ID OUTPUT_DIR +``` + +You can use `text-generation-launcher` to serve models. + +```shell +text-generation-launcher --model-id MODEL_HUB_ID --port 8080 +``` + +There are many options and parameters you can pass to `text-generation-launcher`. The documentation for CLI is kept minimal and intended to rely on self-generating documentation, which can be found by running + +```shell +text-generation-launcher --help +``` + +You can also find it hosted in this [Swagger UI](https://huggingface.github.io/text-generation-inference/). + +Same documentation can be found for `text-generation-server`. + +```shell +text-generation-server --help +``` ## Local Installation from Source @@ -44,7 +92,8 @@ brew install protobuf Then run to install Text Generation Inference: ```shell -BUILD_EXTENSIONS=True make install # Install repository and HF/transformer fork with CUDA kernels +git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference +BUILD_EXTENSIONS=True make install ``` @@ -64,8 +113,3 @@ make run-falcon-7b-instruct ``` This will serve Falcon 7B Instruct model from the port 8080, which we can query. - -To see all options to serve your models, check in the [codebase](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs) or the CLI: -``` -text-generation-launcher --help -``` diff --git a/docs/source/installation_launch.md b/docs/source/installation_launch.md deleted file mode 100644 index 60358dd7..00000000 --- a/docs/source/installation_launch.md +++ /dev/null @@ -1,33 +0,0 @@ -# Getting Started - -The easiest way of getting started is using the official Docker container. - -## Launching with Docker - -Install Docker following [their installation instructions](https://docs.docker.com/get-docker/). - -Let's say you want to deploy [Falcon-7B Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) model with TGI. Here is an example on how to do that: - -```shell -model=tiiuae/falcon-7b-instruct -volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run - -docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.0.0 --model-id $model -``` - - - -To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html) . We also recommend using NVIDIA drivers with CUDA version 11.8 or higher. - - - -To see all options to serve your models, check in the [codebase](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs) or the CLI: - -```shell -text-generation-launcher --help -``` - - - - - diff --git a/docs/source/quicktour.md b/docs/source/quicktour.md index 31185a2d..6abf7a82 100644 --- a/docs/source/quicktour.md +++ b/docs/source/quicktour.md @@ -31,4 +31,4 @@ To see all possible flags and options, you can use the `--help` flag. It's possi docker run ghcr.io/huggingface/text-generation-inference:1.0.0 --help ``` - \ No newline at end of file +