diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index 52e2af02..b6de35cf 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -1,8 +1,8 @@ - sections: - local: index title: Text Generation Inference - - local: docker_launch - title: Launching with Docker + - local: installation_launching + title: Installation and Launching - local: supported_models title: Supported Models and Hardware title: Getting started @@ -13,4 +13,6 @@ title: Consuming TGI - local: basic_tutorials/preparing_model title: Preparing Model for Serving + - local: basic_tutorials/using_cli + title: Using TGI through CLI title: Tutorials diff --git a/docs/source/basic_tutorials/preparing_model.md b/docs/source/basic_tutorials/preparing_model.md index 65a2a197..a1e5f03a 100644 --- a/docs/source/basic_tutorials/preparing_model.md +++ b/docs/source/basic_tutorials/preparing_model.md @@ -4,7 +4,8 @@ Text Generation Inference improves the model in several aspects. ## Quantization -TGI supports [bits-and-bytes](https://github.com/TimDettmers/bitsandbytes#bitsandbytes) and [GPT-Q](https://arxiv.org/abs/2210.17323) quantization. To speed up inference with quantization, simply set `quantize` flag to `bitsandbytes` or `gptq` depending on the quantization technique you wish to use. When using GPT-Q quantization, you need to point to one of the models [here](https://huggingface.co/models?search=gptq). +TGI supports [bits-and-bytes](https://github.com/TimDettmers/bitsandbytes#bitsandbytes) and [GPT-Q](https://arxiv.org/abs/2210.17323) quantization. To speed up inference with quantization, simply set `quantize` flag to `bitsandbytes` or `gptq` depending on the quantization technique you wish to use. When using GPT-Q quantization, you need to point to one of the models [here](https://huggingface.co/models?search=gptq). +To run quantization with TGI, refer to [`Using TGI through CLI`](TODO: ADD INTERNAL REF) section. ## RoPE Scaling diff --git a/docs/source/basic_tutorials/using_cli.md b/docs/source/basic_tutorials/using_cli.md new file mode 100644 index 00000000..d0646701 --- /dev/null +++ b/docs/source/basic_tutorials/using_cli.md @@ -0,0 +1,53 @@ +# Using TGI through CLI + +You can use CLI tools of TGI to download weights, serve and quantize models, or get information on serving parameters. + +## Installing TGI for CLI + +To install TGI to use with CLI, you need to first clone the TGI repository, then inside the repository, run + +```shell +make install +``` + +If you would like to serve models with custom kernels, run + +```shell +BUILD_EXTENSIONS=True make install +``` + +After running this, you will be able to use `text-generation-server` and `text-generation-launcher`. + +`text-generation-server` lets you download the model with `download-weights` command like below 👇 + +```shell +text-generation-server download-weights MODEL_HUB_ID +``` + +You can also use it to quantize models like below 👇 + +```shell +text-generation-server quantize MODEL_HUB_ID OUTPUT_DIR +``` + +You can use `text-generation-launcher` to serve models. + +```shell +text-generation-launcher --model-id MODEL_HUB_ID --port 8080 +``` + +There are many options and parameters you can pass to `text-generation-launcher`. The documentation for CLI is kept minimal and intended to rely on self-generating documentation, which can be found by running + +```shell +text-generation-launcher --help +``` + +You can also find it hosted in this [Swagger UI](https://huggingface.github.io/text-generation-inference/). + +Same documentation can be found for `text-generation-server`. + +```shell +text-generation-server --help +`````` + + diff --git a/docs/source/docker_launch.md b/docs/source/installation_launch.md similarity index 85% rename from docs/source/docker_launch.md rename to docs/source/installation_launch.md index 9f1c89fb..60358dd7 100644 --- a/docs/source/docker_launch.md +++ b/docs/source/installation_launch.md @@ -1,6 +1,10 @@ -# Launching with Docker +# Getting Started -The easiest way of getting started is using the official Docker container. Install Docker following [their installation instructions](https://docs.docker.com/get-docker/). +The easiest way of getting started is using the official Docker container. + +## Launching with Docker + +Install Docker following [their installation instructions](https://docs.docker.com/get-docker/). Let's say you want to deploy [Falcon-7B Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) model with TGI. Here is an example on how to do that: @@ -22,3 +26,8 @@ To see all options to serve your models, check in the [codebase](https://github. ```shell text-generation-launcher --help ``` + + + + +