From 04f7c2d86b2b572a7bf45bc81a130b6826b87379 Mon Sep 17 00:00:00 2001
From: Omar Sanseviero <osanseviero@gmail.com>
Date: Thu, 10 Aug 2023 14:32:07 +0200
Subject: [PATCH 1/2] Fix gated docs (#805)

---
 .../basic_tutorials/gated_model_access.md     | 21 ++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)
diff --git a/docs/source/basic_tutorials/gated_model_access.md b/docs/source/basic_tutorials/gated_model_access.md
index f5858dc4..5d8e98f4 100644
--- a/docs/source/basic_tutorials/gated_model_access.md
+++ b/docs/source/basic_tutorials/gated_model_access.md
@@ -2,4 +2,23 @@
 
 If the model you wish to serve is behind gated access or the model repository on Hugging Face Hub is private, and you have access to the model, you can provide your Hugging Face Hub access token. You can generate and copy a read token from [Hugging Face Hub tokens page](https://huggingface.co/settings/tokens)
 
-If you're using the CLI, set the `HUGGING_FACE_HUB_TOKEN` environment variable.
+If you're using the CLI, set the `HUGGING_FACE_HUB_TOKEN` environment variable. For example:
+
+```
+export HUGGING_FACE_HUB_TOKEN=<YOUR READ TOKEN>
+```
+
+If you would like to do it through Docker, you can provide your token by specifying `HUGGING_FACE_HUB_TOKEN` as shown below.
+
+```bash
+model=meta-llama/Llama-2-7b-chat-hf
+volume=$PWD/data
+token=<your READ token>
+
+docker run --gpus all \
+    --shm-size 1g \
+    -e HUGGING_FACE_HUB_TOKEN=$token \
+    -p 8080:80 \
+    -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.0.0 \
+    --model-id $model
+```
\ No newline at end of file

From 7dbaef3f5b854babb656141596e548227f6ec41f Mon Sep 17 00:00:00 2001
From: Omar Sanseviero <osanseviero@gmail.com>
Date: Thu, 10 Aug 2023 14:32:51 +0200
Subject: [PATCH 2/2] Minor docs style fixes (#806)

---
 docs/source/basic_tutorials/consuming_tgi.md |  5 ++---
 docs/source/installation.md                  | 15 ++++++++-------
 docs/source/quicktour.md                     |  4 ++--
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/docs/source/basic_tutorials/consuming_tgi.md b/docs/source/basic_tutorials/consuming_tgi.md
index 619e0a31..7fb74719 100644
--- a/docs/source/basic_tutorials/consuming_tgi.md
+++ b/docs/source/basic_tutorials/consuming_tgi.md
@@ -6,7 +6,7 @@ There are many ways you can consume Text Generation Inference server in your app
 
 After the launch, you can query the model using either the `/generate` or `/generate_stream` routes:
 
-```shell
+```bash
 curl 127.0.0.1:8080/generate \
     -X POST \
     -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
@@ -20,14 +20,13 @@ curl 127.0.0.1:8080/generate \
 
 You can simply install `huggingface-hub` package with pip.
 
-```python
+```bash
 pip install huggingface-hub
 ```
 
 Once you start the TGI server, instantiate `InferenceClient()` with the URL to the endpoint serving the model. You can then call `text_generation()` to hit the endpoint through Python. 
 
 ```python
-
 from huggingface_hub import InferenceClient
 
 client = InferenceClient(model=URL_TO_ENDPOINT_SERVING_TGI)
diff --git a/docs/source/installation.md b/docs/source/installation.md
index 4105acf4..a8e2e751 100644
--- a/docs/source/installation.md
+++ b/docs/source/installation.md
@@ -16,7 +16,7 @@ Text Generation Inference is available on pypi, conda and GitHub.
 To install and launch locally, first [install Rust](https://rustup.rs/) and create a Python virtual environment with at least
 Python 3.9, e.g. using conda:
 
-```shell
+```bash
 curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
 
 conda create -n text-generation-inference python=3.9
@@ -27,7 +27,7 @@ You may also need to install Protoc.
 
 On Linux:
 
-```shell
+```bash
 PROTOC_ZIP=protoc-21.12-linux-x86_64.zip
 curl -OL https://github.com/protocolbuffers/protobuf/releases/download/v21.12/$PROTOC_ZIP
 sudo unzip -o $PROTOC_ZIP -d /usr/local bin/protoc
@@ -37,13 +37,13 @@ rm -f $PROTOC_ZIP
 
 On MacOS, using Homebrew:
 
-```shell
+```bash
 brew install protobuf
 ```
 
 Then run to install Text Generation Inference:
 
-```shell
+```bash
 BUILD_EXTENSIONS=True make install # Install repository and HF/transformer fork with CUDA kernels
 ```
 
@@ -51,7 +51,7 @@ BUILD_EXTENSIONS=True make install # Install repository and HF/transformer fork
 
 On some machines, you may also need the OpenSSL libraries and gcc. On Linux machines, run:
 
-```shell
+```bash
 sudo apt-get install libssl-dev gcc -y
 ```
 
@@ -59,13 +59,14 @@ sudo apt-get install libssl-dev gcc -y
 
 Once installation is done, simply run:
 
-```shell
+```bash
 make run-falcon-7b-instruct
 ```
 
 This will serve Falcon 7B Instruct model from the port 8080, which we can query.
 
 To see all options to serve your models, check in the [codebase](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs) or the CLI:
-```
+
+```bash
 text-generation-launcher --help
 ```
diff --git a/docs/source/quicktour.md b/docs/source/quicktour.md
index 31185a2d..77f0a9c5 100644
--- a/docs/source/quicktour.md
+++ b/docs/source/quicktour.md
@@ -19,7 +19,7 @@ To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvi
 
 Once TGI is running, you can use the `generate` endpoint by doing requests. To learn more about how to query the endpoints, check the [Consuming TGI](./basic_tutorials/consuming_tgi) section.
 
-```shell
+```bash
 curl 127.0.0.1:8080/generate -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' -H 'Content-Type: application/json'
 ```
 
@@ -27,7 +27,7 @@ curl 127.0.0.1:8080/generate -X POST -d '{"inputs":"What is Deep Learning?","par
 
 To see all possible flags and options, you can use the `--help` flag. It's possible to configure the number of shards, quantization, generation parameters, and more.
 
-```shell
+```bash
 docker run ghcr.io/huggingface/text-generation-inference:1.0.0 --help
 ```