From 470dcdfe7b82d0a1f99aa9111c4019a95bb9366b Mon Sep 17 00:00:00 2001
From: Merve Noyan <merveenoyan@gmail.com>
Date: Tue, 1 Aug 2023 14:10:45 +0300
Subject: [PATCH] Separated querying section and emphasized self generating
 docs

---
 docs/source/_toctree.yml                     |  2 +
 docs/source/basic_tutorials/docker_launch.md | 38 +-----------------
 docs/source/basic_tutorials/local_launch.md  | 39 +------------------
 docs/source/basic_tutorials/querying.md      | 41 ++++++++++++++++++++
 4 files changed, 45 insertions(+), 75 deletions(-)
 create mode 100644 docs/source/basic_tutorials/querying.md

diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
index 9bebe8af..534aea2b 100644
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@@ -11,6 +11,8 @@
     title: Installing and Launching Locally
   - local: basic_tutorials/docker_launch
     title: Launching with Docker
+  - local: basic_tutorials/querying
+    title: Querying the Models
   - local: basic_tutorials/consuming_TGI
     title: Consuming TGI as a backend
   - local: basic_tutorials/consuming_TGI
diff --git a/docs/source/basic_tutorials/docker_launch.md b/docs/source/basic_tutorials/docker_launch.md
index 1a649370..899c01a2 100644
--- a/docs/source/basic_tutorials/docker_launch.md
+++ b/docs/source/basic_tutorials/docker_launch.md
@@ -10,43 +10,7 @@ docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingf
 ```
 **Note:** To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). We also recommend using NVIDIA drivers with CUDA version 11.8 or higher.
 
-
-You can then query the model using either the `/generate` or `/generate_stream` routes:
-
-```shell
-curl 127.0.0.1:8080/generate \
-    -X POST \
-    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
-    -H 'Content-Type: application/json'
-```
-
-```shell
-curl 127.0.0.1:8080/generate_stream \
-    -X POST \
-    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
-    -H 'Content-Type: application/json'
-```
-
-or from Python:
-
-```shell
-pip install text-generation
-```
-
-```python
-from text_generation import Client
-
-client = Client("http://127.0.0.1:8080")
-print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)
-
-text = ""
-for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
-    if not response.token.special:
-        text += response.token.text
-print(text)
-```
-
-To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:
+**Note**: To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:
 ```
 text-generation-launcher --help
 ```
\ No newline at end of file
diff --git a/docs/source/basic_tutorials/local_launch.md b/docs/source/basic_tutorials/local_launch.md
index 077e7b5c..d442a586 100644
--- a/docs/source/basic_tutorials/local_launch.md
+++ b/docs/source/basic_tutorials/local_launch.md
@@ -54,44 +54,7 @@ make run-falcon-7b-instruct
 
 This will serve Falcon 7B Instruct model from the port 8080, which we can query.
 
-You can then query the model using either the `/generate` or `/generate_stream` routes:
-
-```shell
-curl 127.0.0.1:8080/generate \
-    -X POST \
-    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
-    -H 'Content-Type: application/json'
-```
-
-```shell
-curl 127.0.0.1:8080/generate_stream \
-    -X POST \
-    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
-    -H 'Content-Type: application/json'
-```
-
-or through Python:
-
-```shell
-pip install text-generation
-```
-
-Then run:
-
-```python
-from text_generation import Client
-
-client = Client("http://127.0.0.1:8080")
-print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)
-
-text = ""
-for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
-    if not response.token.special:
-        text += response.token.text
-print(text)
-```
-
-To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli:
+**Note**: To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the CLI:
 ```
 text-generation-launcher --help
 ```
diff --git a/docs/source/basic_tutorials/querying.md b/docs/source/basic_tutorials/querying.md
new file mode 100644
index 00000000..007d3b88
--- /dev/null
+++ b/docs/source/basic_tutorials/querying.md
@@ -0,0 +1,41 @@
+# Querying the Models
+
+After the launch, query the model using either the `/generate` or `/generate_stream` routes:
+
+```shell
+curl 127.0.0.1:8080/generate \
+    -X POST \
+    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
+    -H 'Content-Type: application/json'
+```
+
+```shell
+curl 127.0.0.1:8080/generate_stream \
+    -X POST \
+    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
+    -H 'Content-Type: application/json'
+```
+
+or through Python:
+
+```shell
+pip install text-generation
+```
+
+Then run:
+
+```python
+from text_generation import Client
+
+client = Client("http://127.0.0.1:8080")
+print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text)
+
+text = ""
+for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20):
+    if not response.token.special:
+        text += response.token.text
+print(text)
+```
+
+## API documentation
+You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route. The Swagger UI is also available [here](https://huggingface.github.io/text-generation-inference). 
\ No newline at end of file