mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-09-09 19:34:53 +00:00
Update README
This commit is contained in:
parent
37194a5b9a
commit
a0e5fc4189
49
README.md
49
README.md
@ -21,22 +21,24 @@ to power LLMs api-inference widgets.
|
|||||||
|
|
||||||
## Table of contents
|
## Table of contents
|
||||||
|
|
||||||
- [Features](#features)
|
- [Text Generation Inference](#text-generation-inference)
|
||||||
- [Optimized Architectures](#optimized-architectures)
|
- [Table of contents](#table-of-contents)
|
||||||
- [Get Started](#get-started)
|
- [Features](#features)
|
||||||
- [Docker](#docker)
|
- [Optimized architectures](#optimized-architectures)
|
||||||
- [API Documentation](#api-documentation)
|
- [Get started](#get-started)
|
||||||
- [A note on Shared Memory](#a-note-on-shared-memory-shm)
|
- [Docker](#docker)
|
||||||
- [Distributed Tracing](#distributed-tracing)
|
- [API documentation](#api-documentation)
|
||||||
- [Local Install](#local-install)
|
- [Distributed Tracing](#distributed-tracing)
|
||||||
- [CUDA Kernels](#cuda-kernels)
|
- [A note on Shared Memory (shm)](#a-note-on-shared-memory-shm)
|
||||||
- [Run BLOOM](#run-bloom)
|
- [Local install](#local-install)
|
||||||
- [Download](#download)
|
- [CUDA Kernels](#cuda-kernels)
|
||||||
- [Run](#run)
|
- [Run BLOOM](#run-bloom)
|
||||||
- [Quantization](#quantization)
|
- [Download](#download)
|
||||||
- [Develop](#develop)
|
- [Run](#run)
|
||||||
- [Testing](#testing)
|
- [Quantization](#quantization)
|
||||||
|
- [Develop](#develop)
|
||||||
|
- [Testing](#testing)
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
- Serve the most popular Large Language Models with a simple launcher
|
- Serve the most popular Large Language Models with a simple launcher
|
||||||
@ -131,7 +133,7 @@ by setting the address to an OTLP collector with the `--otlp-endpoint` argument.
|
|||||||
|
|
||||||
### A note on Shared Memory (shm)
|
### A note on Shared Memory (shm)
|
||||||
|
|
||||||
[`NCCL`](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/index.html) is a communication framework used by
|
[`NCCL`](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/index.html) is a communication framework used by
|
||||||
`PyTorch` to do distributed training/inference. `text-generation-inference` make
|
`PyTorch` to do distributed training/inference. `text-generation-inference` make
|
||||||
use of `NCCL` to enable Tensor Parallelism to dramatically speed up inference for large language models.
|
use of `NCCL` to enable Tensor Parallelism to dramatically speed up inference for large language models.
|
||||||
|
|
||||||
@ -152,14 +154,14 @@ creating a volume with:
|
|||||||
|
|
||||||
and mounting it to `/dev/shm`.
|
and mounting it to `/dev/shm`.
|
||||||
|
|
||||||
Finally, you can also disable SHM sharing by using the `NCCL_SHM_DISABLE=1` environment variable. However, note that
|
Finally, you can also disable SHM sharing by using the `NCCL_SHM_DISABLE=1` environment variable. However, note that
|
||||||
this will impact performance.
|
this will impact performance.
|
||||||
|
|
||||||
### Local install
|
### Local install
|
||||||
|
|
||||||
You can also opt to install `text-generation-inference` locally.
|
You can also opt to install `text-generation-inference` locally.
|
||||||
|
|
||||||
First [install Rust](https://rustup.rs/) and create a Python virtual environment with at least
|
First [install Rust](https://rustup.rs/) and create a Python virtual environment with at least
|
||||||
Python 3.9, e.g. using `conda`:
|
Python 3.9, e.g. using `conda`:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
@ -181,7 +183,7 @@ sudo unzip -o $PROTOC_ZIP -d /usr/local 'include/*'
|
|||||||
rm -f $PROTOC_ZIP
|
rm -f $PROTOC_ZIP
|
||||||
```
|
```
|
||||||
|
|
||||||
On MacOS, using Homebrew:
|
On MacOS, using Homebrew:
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
brew install protobuf
|
brew install protobuf
|
||||||
@ -241,6 +243,11 @@ make router-dev
|
|||||||
## Testing
|
## Testing
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
|
# python
|
||||||
|
make python-server-tests
|
||||||
|
make python-client-tests
|
||||||
|
# or both server and client tests
|
||||||
make python-tests
|
make python-tests
|
||||||
|
# rust cargo tests
|
||||||
make integration-tests
|
make integration-tests
|
||||||
```
|
```
|
||||||
|
Loading…
Reference in New Issue
Block a user