From e16ecaf0c9f30837b729876858340e768ae93731 Mon Sep 17 00:00:00 2001 From: Merve Noyan Date: Wed, 23 Aug 2023 16:27:55 +0300 Subject: [PATCH] Added trust-remote-code --- docs/source/basic_tutorials/non_core_models.md | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/docs/source/basic_tutorials/non_core_models.md b/docs/source/basic_tutorials/non_core_models.md index 5ceb79bc..cc01a8d3 100644 --- a/docs/source/basic_tutorials/non_core_models.md +++ b/docs/source/basic_tutorials/non_core_models.md @@ -2,20 +2,14 @@ TGI supports various LLM architectures (see full list [here](./supported_models)). If you wish to serve a model that is not one of the supported models, TGI will fallback to transformers implementation of that model. This means you will be unable to use some of the features introduced by TGI, such as tensor-parallel sharding or flash attention. However, you can still get many benefits of TGI, such as continuous batching or streaming outputs. -They can be loaded by: - -```python -from transformers import AutoModelForCausalLM, AutoModelForSeq2SeqLM - -AutoModelForCausalLM.from_pretrained(, device_map="auto")`` - -#or - -AutoModelForSeq2SeqLM.from_pretrained(, device_map="auto") -``` - You can serve these models using Docker like below 👇 ```bash docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id gpt2 ``` + +If the model you wish to serve is not a transformers model, but weights and implementation is included in the repository, you can still serve the model by passing `--trust-remote-code` flag to `docker run` command like below 👇 + +```bash +docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id --trust-remote-code +```