mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-09-10 20:04:52 +00:00
Initial commit
This commit is contained in:
parent
c4422e5678
commit
7dcd953969
@ -17,6 +17,8 @@
|
|||||||
title: Serving Private & Gated Models
|
title: Serving Private & Gated Models
|
||||||
- local: basic_tutorials/using_cli
|
- local: basic_tutorials/using_cli
|
||||||
title: Using TGI CLI
|
title: Using TGI CLI
|
||||||
|
- local: basic_tutorials/custom_models
|
||||||
|
title: Custom Model Serving
|
||||||
title: Tutorials
|
title: Tutorials
|
||||||
- sections:
|
- sections:
|
||||||
- local: conceptual/streaming
|
- local: conceptual/streaming
|
||||||
|
21
docs/source/basic_tutorials/custom_models.md
Normal file
21
docs/source/basic_tutorials/custom_models.md
Normal file
@ -0,0 +1,21 @@
|
|||||||
|
# Custom Model Serving
|
||||||
|
|
||||||
|
TGI supports various LLM architectures (see full list [here](https://github.com/huggingface/text-generation-inference#optimized-architectures)). If you wish to serve a model that is not one of the supported models, TGI will fallback to transformers implementation of that model. They can be loaded by:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from transformers import AutoModelForCausalLM, AutoModelForSeq2SeqLM
|
||||||
|
|
||||||
|
AutoModelForCausalLM.from_pretrained(<model>, device_map="auto")``
|
||||||
|
|
||||||
|
#or
|
||||||
|
|
||||||
|
AutoModelForSeq2SeqLM.from_pretrained(<model>, device_map="auto")
|
||||||
|
```
|
||||||
|
|
||||||
|
This means, you will be unable to use some of the features introduced by TGI, such as tensor-parallel sharding or flash attention. However, you can still get many benefits of TGI, such as continuous batching, or streaming outputs.
|
||||||
|
|
||||||
|
You can serve these models using docker like below 👇
|
||||||
|
|
||||||
|
```bash
|
||||||
|
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id gpt2
|
||||||
|
```
|
Loading…
Reference in New Issue
Block a user