mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-09-10 20:04:52 +00:00
Update docs/source/basic_tutorials/non_core_models.md
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
This commit is contained in:
parent
061b6a9c21
commit
4d12840986
@ -2,7 +2,7 @@
|
||||
|
||||
TGI supports various LLM architectures (see full list [here](../supported_models)). If you wish to serve a model that is not one of the supported models, TGI will fallback to the `transformers` implementation of that model. This means you will be unable to use some of the features introduced by TGI, such as tensor-parallel sharding or flash attention. However, you can still get many benefits of TGI, such as continuous batching or streaming outputs.
|
||||
|
||||
You can serve these models using Docker like below 👇
|
||||
You can serve these models using the same Docker command-line invocation as with fully supported models 👇
|
||||
|
||||
```bash
|
||||
docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id gpt2
|
||||
|
Loading…
Reference in New Issue
Block a user