From c8a01d759173483efc2135c4e7506b23e14e7fc4 Mon Sep 17 00:00:00 2001 From: Merve Noyan Date: Tue, 12 Sep 2023 15:55:14 +0200 Subject: [PATCH] Unsupported model serving docs (#906) Co-authored-by: Omar Sanseviero Co-authored-by: Mishig Co-authored-by: Pedro Cuenca Co-authored-by: OlivierDehaene --- docs/source/_toctree.yml | 2 ++ .../source/basic_tutorials/non_core_models.md | 24 +++++++++++++++++++ 2 files changed, 26 insertions(+) create mode 100644 docs/source/basic_tutorials/non_core_models.md diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index 25f3815e..313b6d32 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -17,6 +17,8 @@ title: Serving Private & Gated Models - local: basic_tutorials/using_cli title: Using TGI CLI + - local: basic_tutorials/non_core_models + title: Non-core Model Serving title: Tutorials - sections: - local: conceptual/streaming diff --git a/docs/source/basic_tutorials/non_core_models.md b/docs/source/basic_tutorials/non_core_models.md new file mode 100644 index 00000000..6f2e6cfa --- /dev/null +++ b/docs/source/basic_tutorials/non_core_models.md @@ -0,0 +1,24 @@ +# Non-core Model Serving + +TGI supports various LLM architectures (see full list [here](../supported_models)). If you wish to serve a model that is not one of the supported models, TGI will fallback to the `transformers` implementation of that model. This means you will be unable to use some of the features introduced by TGI, such as tensor-parallel sharding or flash attention. However, you can still get many benefits of TGI, such as continuous batching or streaming outputs. + +You can serve these models using the same Docker command-line invocation as with fully supported models 👇 + +```bash +docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id gpt2 +``` + +If the model you wish to serve is a custom transformers model, and its weights and implementation are available in the Hub, you can still serve the model by passing the `--trust-remote-code` flag to the `docker run` command like below 👇 + +```bash +docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id --trust-remote-code +``` + +Finally, if the model is not on Hugging Face Hub but on your local, you can pass the path to the folder that contains your model like below 👇 + +```bash +# Make sure your model is in the $volume directory +docker run --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:latest --model-id /data/ +``` + +You can refer to [transformers docs on custom models](https://huggingface.co/docs/transformers/main/en/custom_models) for more information.