sync changes and refactor

2025-09-10 20:04:52 +00:00 · 2023-08-10 13:47:07 +03:00 · 2023-08-10 13:47:07 +03:00 · 15ef2bc082
commit 15ef2bc082
parent 29129dc660
4 changed files with 52 additions and 105 deletions
--- a/docs/source/basic_tutorials/local_launch.md
+++ b/docs/source/basic_tutorials/local_launch.md
@ -1,64 +0,0 @@
 # Installing from the Source and Launching TGI
 Before you start, you will need to setup your environment, and install Text Generation Inference. Text Generation Inference is tested on **Python 3.9+**.
 ## Local Installation from Source
 Text Generation Inference is available on pypi, conda and GitHub. 
 To install and launch locally, first [install Rust](https://rustup.rs/) and create a Python virtual environment with at least
 Python 3.9, e.g. using conda:
 ```shell
 curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
 conda create -n text-generation-inference python=3.9
 conda activate text-generation-inference
 ```
 You may also need to install Protoc.
 On Linux:
 ```shell
 PROTOC_ZIP=protoc-21.12-linux-x86_64.zip
 curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v21.12/$PROTOC_ZIP
 sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc
 sudo unzip -o $PROTOC_ZIP -d /usr/local 'include/*'
 rm -f $PROTOC_ZIP
 ```
 On MacOS, using Homebrew:
 ```shell
 brew install protobuf
 ```
 Then run to install Text Generation Inference:
 ```shell
 BUILD_EXTENSIONS=True make install # Install repository and HF/transformer fork with CUDA kernels
 ```
 <Tip warning={true}>
 On some machines, you may also need the OpenSSL libraries and gcc. On Linux machines, run:
 ```shell
 sudo apt-get install libssl-dev gcc -y
 ```
 </Tip>
 Once installation is done, simply run:
 ```shell
 make run-falcon-7b-instruct
 ```
 This will serve Falcon 7B Instruct model from the port 8080, which we can query.
 To see all options to serve your models, check in the [codebase](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs) or the CLI:
 ```
 text-generation-launcher --help
 ```
--- a/docs/source/installation.md
+++ b/docs/source/installation.md
@ -4,8 +4,56 @@ This section explains how to install the CLI tool as well as installing TGI from
 ## Install CLI
-TODO
+You can use CLI tools of TGI to download weights, serve and quantize models, or get information on serving parameters. 
 To install TGI to use with CLI, you need to first clone the TGI repository, then inside the repository, run
 ```shell
 git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference
 make install
 ```
 If you would like to serve models with custom kernels, run
 ```shell
 BUILD_EXTENSIONS=True make install
 ```
 ## Running CLI
 After installation, you will be able to use `text-generation-server` and `text-generation-launcher`.
 `text-generation-server` lets you download the model with `download-weights` command like below 👇 
 ```shell
 text-generation-server download-weights MODEL_HUB_ID
 ```
 You can also use it to quantize models like below 👇 
 ```shell
 text-generation-server quantize MODEL_HUB_ID OUTPUT_DIR 
 ```
 You can use `text-generation-launcher` to serve models. 
 ```shell
 text-generation-launcher --model-id MODEL_HUB_ID --port 8080
 ```
 There are many options and parameters you can pass to `text-generation-launcher`. The documentation for CLI is kept minimal and intended to rely on self-generating documentation, which can be found by running 
 ```shell
 text-generation-launcher --help
 ``` 
 You can also find it hosted in this [Swagger UI](https://huggingface.github.io/text-generation-inference/).
 Same documentation can be found for `text-generation-server`.
 ```shell
 text-generation-server --help
 ```
 ## Local Installation from Source
@ -44,7 +92,8 @@ brew install protobuf
 Then run to install Text Generation Inference:
 ```shell
-BUILD_EXTENSIONS=True make install # Install repository and HF/transformer fork with CUDA kernels
+git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference
 BUILD_EXTENSIONS=True make install
 ```
 <Tip warning={true}>
@ -64,8 +113,3 @@ make run-falcon-7b-instruct
 ```
 This will serve Falcon 7B Instruct model from the port 8080, which we can query.
 To see all options to serve your models, check in the [codebase](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs) or the CLI:
 ```
 text-generation-launcher --help
 ```
--- a/docs/source/installation_launch.md
+++ b/docs/source/installation_launch.md
@ -1,33 +0,0 @@
 # Getting Started
 The easiest way of getting started is using the official Docker container. 
 ## Launching with Docker
 Install Docker following [their installation instructions](https://docs.docker.com/get-docker/).
 Let's say you want to deploy [Falcon-7B Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) model with TGI. Here is an example on how to do that:
 ```shell
 model=tiiuae/falcon-7b-instruct
 volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
 docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.0.0 --model-id $model
 ```
 <Tip warning={true}>
 To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)  . We also recommend using NVIDIA drivers with CUDA version 11.8 or higher.
 </Tip>
 To see all options to serve your models, check in the [codebase](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs) or the CLI:
 ```shell
 text-generation-launcher --help
 ```
--- a/docs/source/quicktour.md
+++ b/docs/source/quicktour.md
@ -31,4 +31,4 @@ To see all possible flags and options, you can use the `--help` flag. It's possi
 docker run ghcr.io/huggingface/text-generation-inference:1.0.0 --help
 ```
-</Tip>
+</Tip>