feat: fix typo and add more diagrams

2025-09-11 12:24:53 +00:00 · 2024-04-30 14:54:11 -04:00 · 2024-04-30 14:54:11 -04:00 · a2e48ec3a2
commit a2e48ec3a2
parent d48846351d
3 changed files with 15 additions and 4 deletions
--- a/docs/source/_toctree.yml
+++ b/docs/source/_toctree.yml
@ -26,7 +26,7 @@
  - local: basic_tutorials/safety
    title: Safety
  - local: basic_tutorials/using_guidance
-    title: Using Guidance, JSON, tools (via outlines)
+    title: Using Guidance, JSON, tools
  - local: basic_tutorials/visual_language_models
    title: Visual Language Models
  title: Tutorials
@ -46,6 +46,6 @@
  - local: conceptual/speculation
    title: Speculation (Medusa, ngram)
  - local: conceptual/guidance
-    title: How Guidance Works
+    title: How Guidance Works (via outlines)

  title: Conceptual Guides
--- a/docs/source/basic_tutorials/using_guidance.md
+++ b/docs/source/basic_tutorials/using_guidance.md
@ -122,7 +122,7 @@ print(response.json())

 ### JSON Schema Integration

-If Pydantic's not your style, go raw with direct JSON Schema integration. This is simliar to the first example but with programmatic control.
+If Pydantic's not your style, go raw with direct JSON Schema integration. This is similar to the first example but with programmatic control.

 ```python
 import requests
--- a/docs/source/conceptual/guidance.md
+++ b/docs/source/conceptual/guidance.md
@ -23,7 +23,6 @@ However these use cases can span a wide range of applications, such as:
 - provide reliable and consistent output for downstream tasks
 - extract data from multimodal inputs

-
 ## How it works?

 Diving into the details, guidance is enabled by including a grammar with a generation request that is compiled, and used to modify the chosen tokens.
@ -31,6 +30,18 @@ Diving into the details, guidance is enabled by including a grammar with a gener
 This process can be broken down into the following steps:

 1. A request is sent to the backend, it is processed and placed in batch. Processing includes compiling the grammar into a finite state machine and a grammar state.
+
+<div class="flex justify-center">
+    <img
+        class="block dark:hidden"
+        src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/request-to-batch.gif"
+    />
+    <img
+        class="hidden dark:block"
+        src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/request-to-batch-dark.gif"
+    />
+</div>
+
 2. The model does a forward pass over the batch. This returns probabilities for each token in the vocabulary for each request in the batch.

 3. The process of choosing one of those tokens is called `sampling`. The model samples from the distribution of probabilities to choose the next token. In TGI all of the steps before sampling are called `processor`. Grammars are applied as a processor that masks out tokens that are not allowed by the grammar.