mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-09-10 20:04:52 +00:00
adjust toctree and shell to bash
This commit is contained in:
parent
6fa3d9963c
commit
3fb7803ed9
@ -16,5 +16,5 @@
|
|||||||
- local: basic_tutorials/gated_model_access
|
- local: basic_tutorials/gated_model_access
|
||||||
title: Serving Private & Gated Models
|
title: Serving Private & Gated Models
|
||||||
- local: basic_tutorials/using_cli
|
- local: basic_tutorials/using_cli
|
||||||
title: Using TGI through CLI
|
title: Using TGI CLI
|
||||||
title: Tutorials
|
title: Tutorials
|
||||||
|
@ -30,6 +30,6 @@ You can also find it hosted in this [Swagger UI](https://huggingface.github.io/t
|
|||||||
|
|
||||||
Same documentation can be found for `text-generation-server`.
|
Same documentation can be found for `text-generation-server`.
|
||||||
|
|
||||||
```shell
|
```bash
|
||||||
text-generation-server --help
|
text-generation-server --help
|
||||||
```
|
```
|
||||||
|
@ -8,14 +8,14 @@ You can use TGI command-line interface (CLI) to download weights, serve and quan
|
|||||||
|
|
||||||
To install the CLI, you need to first clone the TGI repository and then run `make`.
|
To install the CLI, you need to first clone the TGI repository and then run `make`.
|
||||||
|
|
||||||
```shell
|
```bash
|
||||||
git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference
|
git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference
|
||||||
make install
|
make install
|
||||||
```
|
```
|
||||||
|
|
||||||
If you would like to serve models with custom kernels, run
|
If you would like to serve models with custom kernels, run
|
||||||
|
|
||||||
```shell
|
```bash
|
||||||
BUILD_EXTENSIONS=True make install
|
BUILD_EXTENSIONS=True make install
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -28,7 +28,7 @@ Text Generation Inference is available on pypi, conda and GitHub.
|
|||||||
To install and launch locally, first [install Rust](https://rustup.rs/) and create a Python virtual environment with at least
|
To install and launch locally, first [install Rust](https://rustup.rs/) and create a Python virtual environment with at least
|
||||||
Python 3.9, e.g. using conda:
|
Python 3.9, e.g. using conda:
|
||||||
|
|
||||||
```shell
|
```bash
|
||||||
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
|
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
|
||||||
|
|
||||||
conda create -n text-generation-inference python=3.9
|
conda create -n text-generation-inference python=3.9
|
||||||
@ -39,7 +39,7 @@ You may also need to install Protoc.
|
|||||||
|
|
||||||
On Linux:
|
On Linux:
|
||||||
|
|
||||||
```shell
|
```bash
|
||||||
PROTOC_ZIP=protoc-21.12-linux-x86_64.zip
|
PROTOC_ZIP=protoc-21.12-linux-x86_64.zip
|
||||||
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v21.12/$PROTOC_ZIP
|
curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v21.12/$PROTOC_ZIP
|
||||||
sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc
|
sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc
|
||||||
@ -49,13 +49,13 @@ rm -f $PROTOC_ZIP
|
|||||||
|
|
||||||
On MacOS, using Homebrew:
|
On MacOS, using Homebrew:
|
||||||
|
|
||||||
```shell
|
```bash
|
||||||
brew install protobuf
|
brew install protobuf
|
||||||
```
|
```
|
||||||
|
|
||||||
Then run to install Text Generation Inference:
|
Then run to install Text Generation Inference:
|
||||||
|
|
||||||
```shell
|
```bash
|
||||||
git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference
|
git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference
|
||||||
BUILD_EXTENSIONS=True make install
|
BUILD_EXTENSIONS=True make install
|
||||||
```
|
```
|
||||||
@ -64,7 +64,7 @@ BUILD_EXTENSIONS=True make install
|
|||||||
|
|
||||||
On some machines, you may also need the OpenSSL libraries and gcc. On Linux machines, run:
|
On some machines, you may also need the OpenSSL libraries and gcc. On Linux machines, run:
|
||||||
|
|
||||||
```shell
|
```bash
|
||||||
sudo apt-get install libssl-dev gcc -y
|
sudo apt-get install libssl-dev gcc -y
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -72,7 +72,7 @@ sudo apt-get install libssl-dev gcc -y
|
|||||||
|
|
||||||
Once installation is done, simply run:
|
Once installation is done, simply run:
|
||||||
|
|
||||||
```shell
|
```bash
|
||||||
make run-falcon-7b-instruct
|
make run-falcon-7b-instruct
|
||||||
```
|
```
|
||||||
|
|
||||||
|
@ -4,7 +4,7 @@ The easiest way of getting started is using the official Docker container. Insta
|
|||||||
|
|
||||||
Let's say you want to deploy [Falcon-7B Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) model with TGI. Here is an example on how to do that:
|
Let's say you want to deploy [Falcon-7B Instruct](https://huggingface.co/tiiuae/falcon-7b-instruct) model with TGI. Here is an example on how to do that:
|
||||||
|
|
||||||
```shell
|
```bash
|
||||||
model=tiiuae/falcon-7b-instruct
|
model=tiiuae/falcon-7b-instruct
|
||||||
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
|
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
|
||||||
|
|
||||||
@ -19,7 +19,7 @@ To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvi
|
|||||||
|
|
||||||
Once TGI is running, you can use the `generate` endpoint by doing requests. To learn more about how to query the endpoints, check the [Consuming TGI](./basic_tutorials/consuming_tgi) section.
|
Once TGI is running, you can use the `generate` endpoint by doing requests. To learn more about how to query the endpoints, check the [Consuming TGI](./basic_tutorials/consuming_tgi) section.
|
||||||
|
|
||||||
```shell
|
```bash
|
||||||
curl 127.0.0.1:8080/generate -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' -H 'Content-Type: application/json'
|
curl 127.0.0.1:8080/generate -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' -H 'Content-Type: application/json'
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -27,7 +27,7 @@ curl 127.0.0.1:8080/generate -X POST -d '{"inputs":"What is Deep Learning?","par
|
|||||||
|
|
||||||
To see all possible flags and options, you can use the `--help` flag. It's possible to configure the number of shards, quantization, generation parameters, and more.
|
To see all possible flags and options, you can use the `--help` flag. It's possible to configure the number of shards, quantization, generation parameters, and more.
|
||||||
|
|
||||||
```shell
|
```bash
|
||||||
docker run ghcr.io/huggingface/text-generation-inference:1.0.0 --help
|
docker run ghcr.io/huggingface/text-generation-inference:1.0.0 --help
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user