mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-09-11 12:24:53 +00:00
feat: fix typo and add more diagrams
This commit is contained in:
parent
d48846351d
commit
a2e48ec3a2
@ -26,7 +26,7 @@
|
||||
- local: basic_tutorials/safety
|
||||
title: Safety
|
||||
- local: basic_tutorials/using_guidance
|
||||
title: Using Guidance, JSON, tools (via outlines)
|
||||
title: Using Guidance, JSON, tools
|
||||
- local: basic_tutorials/visual_language_models
|
||||
title: Visual Language Models
|
||||
title: Tutorials
|
||||
@ -46,6 +46,6 @@
|
||||
- local: conceptual/speculation
|
||||
title: Speculation (Medusa, ngram)
|
||||
- local: conceptual/guidance
|
||||
title: How Guidance Works
|
||||
title: How Guidance Works (via outlines)
|
||||
|
||||
title: Conceptual Guides
|
||||
|
@ -122,7 +122,7 @@ print(response.json())
|
||||
|
||||
### JSON Schema Integration
|
||||
|
||||
If Pydantic's not your style, go raw with direct JSON Schema integration. This is simliar to the first example but with programmatic control.
|
||||
If Pydantic's not your style, go raw with direct JSON Schema integration. This is similar to the first example but with programmatic control.
|
||||
|
||||
```python
|
||||
import requests
|
||||
|
@ -23,7 +23,6 @@ However these use cases can span a wide range of applications, such as:
|
||||
- provide reliable and consistent output for downstream tasks
|
||||
- extract data from multimodal inputs
|
||||
|
||||
|
||||
## How it works?
|
||||
|
||||
Diving into the details, guidance is enabled by including a grammar with a generation request that is compiled, and used to modify the chosen tokens.
|
||||
@ -31,6 +30,18 @@ Diving into the details, guidance is enabled by including a grammar with a gener
|
||||
This process can be broken down into the following steps:
|
||||
|
||||
1. A request is sent to the backend, it is processed and placed in batch. Processing includes compiling the grammar into a finite state machine and a grammar state.
|
||||
|
||||
<div class="flex justify-center">
|
||||
<img
|
||||
class="block dark:hidden"
|
||||
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/request-to-batch.gif"
|
||||
/>
|
||||
<img
|
||||
class="hidden dark:block"
|
||||
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/request-to-batch-dark.gif"
|
||||
/>
|
||||
</div>
|
||||
|
||||
2. The model does a forward pass over the batch. This returns probabilities for each token in the vocabulary for each request in the batch.
|
||||
|
||||
3. The process of choosing one of those tokens is called `sampling`. The model samples from the distribution of probabilities to choose the next token. In TGI all of the steps before sampling are called `processor`. Grammars are applied as a processor that masks out tokens that are not allowed by the grammar.
|
||||
|
Loading…
Reference in New Issue
Block a user