mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-09-10 20:04:52 +00:00
Added CLI docs and rename docker launch
This commit is contained in:
parent
ec592d550e
commit
c4c7e6d80d
@ -1,8 +1,8 @@
|
|||||||
- sections:
|
- sections:
|
||||||
- local: index
|
- local: index
|
||||||
title: Text Generation Inference
|
title: Text Generation Inference
|
||||||
- local: docker_launch
|
- local: installation_launching
|
||||||
title: Launching with Docker
|
title: Installation and Launching
|
||||||
- local: supported_models
|
- local: supported_models
|
||||||
title: Supported Models and Hardware
|
title: Supported Models and Hardware
|
||||||
title: Getting started
|
title: Getting started
|
||||||
@ -13,4 +13,6 @@
|
|||||||
title: Consuming TGI
|
title: Consuming TGI
|
||||||
- local: basic_tutorials/preparing_model
|
- local: basic_tutorials/preparing_model
|
||||||
title: Preparing Model for Serving
|
title: Preparing Model for Serving
|
||||||
|
- local: basic_tutorials/using_cli
|
||||||
|
title: Using TGI through CLI
|
||||||
title: Tutorials
|
title: Tutorials
|
||||||
|
@ -4,7 +4,8 @@ Text Generation Inference improves the model in several aspects.
|
|||||||
|
|
||||||
## Quantization
|
## Quantization
|
||||||
|
|
||||||
TGI supports [bits-and-bytes](https://github.com/TimDettmers/bitsandbytes#bitsandbytes) and [GPT-Q](https://arxiv.org/abs/2210.17323) quantization. To speed up inference with quantization, simply set `quantize` flag to `bitsandbytes` or `gptq` depending on the quantization technique you wish to use. When using GPT-Q quantization, you need to point to one of the models [here](https://huggingface.co/models?search=gptq).
|
TGI supports [bits-and-bytes](https://github.com/TimDettmers/bitsandbytes#bitsandbytes) and [GPT-Q](https://arxiv.org/abs/2210.17323) quantization. To speed up inference with quantization, simply set `quantize` flag to `bitsandbytes` or `gptq` depending on the quantization technique you wish to use. When using GPT-Q quantization, you need to point to one of the models [here](https://huggingface.co/models?search=gptq).
|
||||||
|
To run quantization with TGI, refer to [`Using TGI through CLI`](TODO: ADD INTERNAL REF) section.
|
||||||
|
|
||||||
|
|
||||||
## RoPE Scaling
|
## RoPE Scaling
|
||||||
|
53
docs/source/basic_tutorials/using_cli.md
Normal file
53
docs/source/basic_tutorials/using_cli.md
Normal file
@ -0,0 +1,53 @@
|
|||||||
|
# Using TGI through CLI
|
||||||
|
|
||||||
|
You can use CLI tools of TGI to download weights, serve and quantize models, or get information on serving parameters.
|
||||||
|
|
||||||
|
## Installing TGI for CLI
|
||||||
|
|
||||||
|
To install TGI to use with CLI, you need to first clone the TGI repository, then inside the repository, run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
make install
|
||||||
|
```
|
||||||
|
|
||||||
|
If you would like to serve models with custom kernels, run
|
||||||
|
|
||||||
|
```shell
|
||||||
|
BUILD_EXTENSIONS=True make install
|
||||||
|
```
|
||||||
|
|
||||||
|
After running this, you will be able to use `text-generation-server` and `text-generation-launcher`.
|
||||||
|
|
||||||
|
`text-generation-server` lets you download the model with `download-weights` command like below 👇
|
||||||
|
|
||||||
|
```shell
|
||||||
|
text-generation-server download-weights MODEL_HUB_ID
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also use it to quantize models like below 👇
|
||||||
|
|
||||||
|
```shell
|
||||||
|
text-generation-server quantize MODEL_HUB_ID OUTPUT_DIR
|
||||||
|
```
|
||||||
|
|
||||||
|
You can use `text-generation-launcher` to serve models.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
text-generation-launcher --model-id MODEL_HUB_ID --port 8080
|
||||||
|
```
|
||||||
|
|
||||||
|
There are many options and parameters you can pass to `text-generation-launcher`. The documentation for CLI is kept minimal and intended to rely on self-generating documentation, which can be found by running
|
||||||
|
|
||||||
|
```shell
|
||||||
|
text-generation-launcher --help
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also find it hosted in this [Swagger UI](https://huggingface.github.io/text-generation-inference/).
|
||||||
|
|
||||||
|
Same documentation can be found for `text-generation-server`.
|
||||||
|
|
||||||
|
```shell
|
||||||
|
text-generation-server --help
|
||||||
|
``````
|
||||||
|
|
||||||
|
|
@ -1,6 +1,10 @@
|
|||||||
# Launching with Docker
|
# Getting Started
|
||||||
|
|
||||||
The easiest way of getting started is using the official Docker container. Install Docker following [their installation instructions](https://docs.docker.com/get-docker/).
|
The easiest way of getting started is using the official Docker container.
|
||||||
|
|
||||||
|
## Launching with Docker
|
||||||
|
|
||||||
|
Install Docker following [their installation instructions](https://docs.docker.com/get-docker/).
|
||||||
|
|
||||||
Let's say you want to deploy [Falcon-7B Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) model with TGI. Here is an example on how to do that:
|
Let's say you want to deploy [Falcon-7B Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) model with TGI. Here is an example on how to do that:
|
||||||
|
|
||||||
@ -22,3 +26,8 @@ To see all options to serve your models, check in the [codebase](https://github.
|
|||||||
```shell
|
```shell
|
||||||
text-generation-launcher --help
|
text-generation-launcher --help
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue
Block a user