mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-09-10 20:04:52 +00:00
Update README.md
This commit is contained in:
parent
4a21912edf
commit
433cc0f4d9
44
README.md
44
README.md
@ -16,7 +16,21 @@ to power Hugging Chat, the Inference API and Inference Endpoint.
|
|||||||
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
##
|
## Table of contents
|
||||||
|
|
||||||
|
- [Get Started](#get-started)
|
||||||
|
- [API Documentation](#api-documentation)
|
||||||
|
- [Using a private or gated model](#using-a-private-or-gated-model)
|
||||||
|
- [A note on Shared Memory](#a-note-on-shared-memory-shm)
|
||||||
|
- [Distributed Tracing](#distributed-tracing)
|
||||||
|
- [Local Install](#local-install)
|
||||||
|
- [CUDA Kernels](#cuda-kernels)
|
||||||
|
- [Optimized architectures](#optimized-architectures)
|
||||||
|
- [Run Falcon](#run-falcon)
|
||||||
|
- [Run](#run)
|
||||||
|
- [Quantization](#quantization)
|
||||||
|
- [Develop](#develop)
|
||||||
|
- [Testing](#testing)
|
||||||
|
|
||||||
Text Generation Inference (TGI) is a toolkit for deploying and serving Large Language Models (LLMs). TGI enables high-performance text generation for the most popular open-source LLMs, including Llama, Falcon, StarCoder, BLOOM, GPT-NeoX, and [more](https://huggingface.co/docs/text-generation-inference/supported_models). TGI implements many features, such as:
|
Text Generation Inference (TGI) is a toolkit for deploying and serving Large Language Models (LLMs). TGI enables high-performance text generation for the most popular open-source LLMs, including Llama, Falcon, StarCoder, BLOOM, GPT-NeoX, and [more](https://huggingface.co/docs/text-generation-inference/supported_models). TGI implements many features, such as:
|
||||||
|
|
||||||
@ -63,19 +77,7 @@ To see all options to serve your models (in the [code](https://github.com/huggin
|
|||||||
text-generation-launcher --help
|
text-generation-launcher --help
|
||||||
```
|
```
|
||||||
|
|
||||||
## Optimized architectures
|
### API documentation
|
||||||
|
|
||||||
TGI works out of the box to serve optimized models in [this list](https://huggingface.co/docs/text-generation-inference/supported_models).
|
|
||||||
|
|
||||||
Other architectures are supported on a best-effort basis using:
|
|
||||||
|
|
||||||
`AutoModelForCausalLM.from_pretrained(<model>, device_map="auto")`
|
|
||||||
|
|
||||||
or
|
|
||||||
|
|
||||||
`AutoModelForSeq2SeqLM.from_pretrained(<model>, device_map="auto")`
|
|
||||||
|
|
||||||
## API documentation
|
|
||||||
|
|
||||||
You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route. The
|
You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route. The
|
||||||
Swagger UI is also available at: [https://huggingface.github.io/text-generation-inference](https://huggingface.github.io/text-generation-inference).
|
Swagger UI is also available at: [https://huggingface.github.io/text-generation-inference](https://huggingface.github.io/text-generation-inference).
|
||||||
@ -164,6 +166,20 @@ the kernels by using the `DISABLE_CUSTOM_KERNELS=True` environment variable.
|
|||||||
|
|
||||||
Be aware that the official Docker image has them enabled by default.
|
Be aware that the official Docker image has them enabled by default.
|
||||||
|
|
||||||
|
## Optimized architectures
|
||||||
|
|
||||||
|
TGI works out of the box to serve optimized models in [this list](https://huggingface.co/docs/text-generation-inference/supported_models).
|
||||||
|
|
||||||
|
Other architectures are supported on a best-effort basis using:
|
||||||
|
|
||||||
|
`AutoModelForCausalLM.from_pretrained(<model>, device_map="auto")`
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
`AutoModelForSeq2SeqLM.from_pretrained(<model>, device_map="auto")`
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Run Falcon
|
## Run Falcon
|
||||||
|
|
||||||
### Run
|
### Run
|
||||||
|
Loading…
Reference in New Issue
Block a user