From 470dcdfe7b82d0a1f99aa9111c4019a95bb9366b Mon Sep 17 00:00:00 2001 From: Merve Noyan Date: Tue, 1 Aug 2023 14:10:45 +0300 Subject: [PATCH] Separated querying section and emphasized self generating docs --- docs/source/_toctree.yml | 2 + docs/source/basic_tutorials/docker_launch.md | 38 +----------------- docs/source/basic_tutorials/local_launch.md | 39 +------------------ docs/source/basic_tutorials/querying.md | 41 ++++++++++++++++++++ 4 files changed, 45 insertions(+), 75 deletions(-) create mode 100644 docs/source/basic_tutorials/querying.md diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index 9bebe8af..534aea2b 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -11,6 +11,8 @@ title: Installing and Launching Locally - local: basic_tutorials/docker_launch title: Launching with Docker + - local: basic_tutorials/querying + title: Querying the Models - local: basic_tutorials/consuming_TGI title: Consuming TGI as a backend - local: basic_tutorials/consuming_TGI diff --git a/docs/source/basic_tutorials/docker_launch.md b/docs/source/basic_tutorials/docker_launch.md index 1a649370..899c01a2 100644 --- a/docs/source/basic_tutorials/docker_launch.md +++ b/docs/source/basic_tutorials/docker_launch.md @@ -10,43 +10,7 @@ docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingf ``` **Note:** To use GPUs, you need to install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html). We also recommend using NVIDIA drivers with CUDA version 11.8 or higher. - -You can then query the model using either the `/generate` or `/generate_stream` routes: - -```shell -curl 127.0.0.1:8080/generate \ - -X POST \ - -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ - -H 'Content-Type: application/json' -``` - -```shell -curl 127.0.0.1:8080/generate_stream \ - -X POST \ - -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ - -H 'Content-Type: application/json' -``` - -or from Python: - -```shell -pip install text-generation -``` - -```python -from text_generation import Client - -client = Client("http://127.0.0.1:8080") -print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text) - -text = "" -for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20): - if not response.token.special: - text += response.token.text -print(text) -``` - -To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli: +**Note**: To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli: ``` text-generation-launcher --help ``` \ No newline at end of file diff --git a/docs/source/basic_tutorials/local_launch.md b/docs/source/basic_tutorials/local_launch.md index 077e7b5c..d442a586 100644 --- a/docs/source/basic_tutorials/local_launch.md +++ b/docs/source/basic_tutorials/local_launch.md @@ -54,44 +54,7 @@ make run-falcon-7b-instruct This will serve Falcon 7B Instruct model from the port 8080, which we can query. -You can then query the model using either the `/generate` or `/generate_stream` routes: - -```shell -curl 127.0.0.1:8080/generate \ - -X POST \ - -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ - -H 'Content-Type: application/json' -``` - -```shell -curl 127.0.0.1:8080/generate_stream \ - -X POST \ - -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ - -H 'Content-Type: application/json' -``` - -or through Python: - -```shell -pip install text-generation -``` - -Then run: - -```python -from text_generation import Client - -client = Client("http://127.0.0.1:8080") -print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text) - -text = "" -for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20): - if not response.token.special: - text += response.token.text -print(text) -``` - -To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the cli: +**Note**: To see all options to serve your models (in the [code](https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs)) or in the CLI: ``` text-generation-launcher --help ``` diff --git a/docs/source/basic_tutorials/querying.md b/docs/source/basic_tutorials/querying.md new file mode 100644 index 00000000..007d3b88 --- /dev/null +++ b/docs/source/basic_tutorials/querying.md @@ -0,0 +1,41 @@ +# Querying the Models + +After the launch, query the model using either the `/generate` or `/generate_stream` routes: + +```shell +curl 127.0.0.1:8080/generate \ + -X POST \ + -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ + -H 'Content-Type: application/json' +``` + +```shell +curl 127.0.0.1:8080/generate_stream \ + -X POST \ + -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ + -H 'Content-Type: application/json' +``` + +or through Python: + +```shell +pip install text-generation +``` + +Then run: + +```python +from text_generation import Client + +client = Client("http://127.0.0.1:8080") +print(client.generate("What is Deep Learning?", max_new_tokens=20).generated_text) + +text = "" +for response in client.generate_stream("What is Deep Learning?", max_new_tokens=20): + if not response.token.special: + text += response.token.text +print(text) +``` + +## API documentation +You can consult the OpenAPI documentation of the `text-generation-inference` REST API using the `/docs` route. The Swagger UI is also available [here](https://huggingface.github.io/text-generation-inference). \ No newline at end of file