feat: fix typo and add more diagrams

This commit is contained in:
drbh 2024-04-30 14:54:11 -04:00
parent d48846351d
commit a2e48ec3a2
3 changed files with 15 additions and 4 deletions

View File

@ -26,7 +26,7 @@
- local: basic_tutorials/safety
title: Safety
- local: basic_tutorials/using_guidance
title: Using Guidance, JSON, tools (via outlines)
title: Using Guidance, JSON, tools
- local: basic_tutorials/visual_language_models
title: Visual Language Models
title: Tutorials
@ -46,6 +46,6 @@
- local: conceptual/speculation
title: Speculation (Medusa, ngram)
- local: conceptual/guidance
title: How Guidance Works
title: How Guidance Works (via outlines)
title: Conceptual Guides

View File

@ -122,7 +122,7 @@ print(response.json())
### JSON Schema Integration
If Pydantic's not your style, go raw with direct JSON Schema integration. This is simliar to the first example but with programmatic control.
If Pydantic's not your style, go raw with direct JSON Schema integration. This is similar to the first example but with programmatic control.
```python
import requests

View File

@ -23,7 +23,6 @@ However these use cases can span a wide range of applications, such as:
- provide reliable and consistent output for downstream tasks
- extract data from multimodal inputs
## How it works?
Diving into the details, guidance is enabled by including a grammar with a generation request that is compiled, and used to modify the chosen tokens.
@ -31,6 +30,18 @@ Diving into the details, guidance is enabled by including a grammar with a gener
This process can be broken down into the following steps:
1. A request is sent to the backend, it is processed and placed in batch. Processing includes compiling the grammar into a finite state machine and a grammar state.
<div class="flex justify-center">
<img
class="block dark:hidden"
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/request-to-batch.gif"
/>
<img
class="hidden dark:block"
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/request-to-batch-dark.gif"
/>
</div>
2. The model does a forward pass over the batch. This returns probabilities for each token in the vocabulary for each request in the batch.
3. The process of choosing one of those tokens is called `sampling`. The model samples from the distribution of probabilities to choose the next token. In TGI all of the steps before sampling are called `processor`. Grammars are applied as a processor that masks out tokens that are not allowed by the grammar.