text-generation-inference/docs/source/basic_tutorials/gated_model_access.md

# Serving Private & Gated Models

If the model you wish to serve is behind gated access or the model repository on Hugging Face Hub is private, and you have access to the model, you can provide your Hugging Face Hub access token. You can generate and copy a read token from [Hugging Face Hub tokens page](https://huggingface.co/settings/tokens)

If you're using the CLI, set the `HUGGING_FACE_HUB_TOKEN` environment variable. For example:

```
export HUGGING_FACE_HUB_TOKEN=<YOUR READ TOKEN>
```

If you would like to do it through Docker, you can provide your token by specifying `HUGGING_FACE_HUB_TOKEN` as shown below.

```bash
model=meta-llama/Llama-2-7b-chat-hf
volume=$PWD/data
token=<your READ token>

docker run --gpus all \
    --shm-size 1g \
    -e HUGGING_FACE_HUB_TOKEN=$token \
    -p 8080:80 \
    -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.0.0 \
    --model-id $model
```
Setup for doc-builder and docs for TGI (#740) I added ToC for docs v1 & started setting up for doc-builder. cc @Narsil @osanseviero --------- Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> Co-authored-by: osanseviero <osanseviero@gmail.com> Co-authored-by: Mishig <mishig.davaadorj@coloradocollege.edu> 2023-08-10 08:24:52 +00:00			`# Serving Private & Gated Models`

			`If the model you wish to serve is behind gated access or the model repository on Hugging Face Hub is private, and you have access to the model, you can provide your Hugging Face Hub access token. You can generate and copy a read token from [Hugging Face Hub tokens page](https://huggingface.co/settings/tokens)`

Fix gated docs (#805) 2023-08-10 12:32:07 +00:00			If you're using the CLI, set the `HUGGING_FACE_HUB_TOKEN` environment variable. For example:

			```
			`export HUGGING_FACE_HUB_TOKEN=<YOUR READ TOKEN>`
			```

			If you would like to do it through Docker, you can provide your token by specifying `HUGGING_FACE_HUB_TOKEN` as shown below.

			```bash
			`model=meta-llama/Llama-2-7b-chat-hf`
			`volume=$PWD/data`
			`token=<your READ token>`

			`docker run --gpus all \`
			`--shm-size 1g \`
			`-e HUGGING_FACE_HUB_TOKEN=$token \`
			`-p 8080:80 \`
			`-v $volume:/data ghcr.io/huggingface/text-generation-inference:1.0.0 \`
			`--model-id $model`
			```