From e58ad6dd66413ef34585348cdbac1664da391fa9 Mon Sep 17 00:00:00 2001 From: Merve Noyan Date: Thu, 10 Aug 2023 16:00:30 +0300 Subject: [PATCH] Added CLI docs (#799) Added docs for CLI --- docs/source/_toctree.yml | 2 ++ docs/source/basic_tutorials/using_cli.md | 35 ++++++++++++++++++++++++ docs/source/installation.md | 23 ++++++++++------ docs/source/quicktour.md | 4 +-- 4 files changed, 54 insertions(+), 10 deletions(-) create mode 100644 docs/source/basic_tutorials/using_cli.md diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index 8ee20eb0..a161dc28 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -15,4 +15,6 @@ title: Preparing Model for Serving - local: basic_tutorials/gated_model_access title: Serving Private & Gated Models + - local: basic_tutorials/using_cli + title: Using TGI CLI title: Tutorials diff --git a/docs/source/basic_tutorials/using_cli.md b/docs/source/basic_tutorials/using_cli.md new file mode 100644 index 00000000..82c10e6b --- /dev/null +++ b/docs/source/basic_tutorials/using_cli.md @@ -0,0 +1,35 @@ +# Using TGI CLI + +You can use TGI command-line interface (CLI) to download weights, serve and quantize models, or get information on serving parameters. To install the CLI, please refer to [the installation section](./installation#install-cli). + +`text-generation-server` lets you download the model with `download-weights` command like below 👇 + +```bash +text-generation-server download-weights MODEL_HUB_ID +``` + +You can also use it to quantize models like below 👇 + +```bash +text-generation-server quantize MODEL_HUB_ID OUTPUT_DIR +``` + +You can use `text-generation-launcher` to serve models. + +```bash +text-generation-launcher --model-id MODEL_HUB_ID --port 8080 +``` + +There are many options and parameters you can pass to `text-generation-launcher`. The documentation for CLI is kept minimal and intended to rely on self-generating documentation, which can be found by running + +```bash +text-generation-launcher --help +``` + +You can also find it hosted in this [Swagger UI](https://huggingface.github.io/text-generation-inference/). + +Same documentation can be found for `text-generation-server`. + +```bash +text-generation-server --help +``` diff --git a/docs/source/installation.md b/docs/source/installation.md index a8e2e751..1301b930 100644 --- a/docs/source/installation.md +++ b/docs/source/installation.md @@ -4,8 +4,20 @@ This section explains how to install the CLI tool as well as installing TGI from ## Install CLI -TODO +You can use TGI command-line interface (CLI) to download weights, serve and quantize models, or get information on serving parameters. +To install the CLI, you need to first clone the TGI repository and then run `make`. + +```bash +git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference +make install +``` + +If you would like to serve models with custom kernels, run + +```bash +BUILD_EXTENSIONS=True make install +``` ## Local Installation from Source @@ -44,7 +56,8 @@ brew install protobuf Then run to install Text Generation Inference: ```bash -BUILD_EXTENSIONS=True make install # Install repository and HF/transformer fork with CUDA kernels +git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference +BUILD_EXTENSIONS=True make install ``` @@ -64,9 +77,3 @@ make run-falcon-7b-instruct ``` This will serve Falcon 7B Instruct model from the port 8080, which we can query. - -To see all options to serve your models, check in the [codebase](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs) or the CLI: - -```bash -text-generation-launcher --help -``` diff --git a/docs/source/quicktour.md b/docs/source/quicktour.md index 77f0a9c5..f447bc19 100644 --- a/docs/source/quicktour.md +++ b/docs/source/quicktour.md @@ -4,7 +4,7 @@ The easiest way of getting started is using the official Docker container. Insta Let's say you want to deploy [Falcon-7B Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) model with TGI. Here is an example on how to do that: -```shell +```bash model=tiiuae/falcon-7b-instruct volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run @@ -31,4 +31,4 @@ To see all possible flags and options, you can use the `--help` flag. It's possi docker run ghcr.io/huggingface/text-generation-inference:1.0.0 --help ``` - \ No newline at end of file +