mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-09-11 20:34:54 +00:00
feat: fix typo and add more diagrams
This commit is contained in:
parent
d48846351d
commit
a2e48ec3a2
@ -26,7 +26,7 @@
|
|||||||
- local: basic_tutorials/safety
|
- local: basic_tutorials/safety
|
||||||
title: Safety
|
title: Safety
|
||||||
- local: basic_tutorials/using_guidance
|
- local: basic_tutorials/using_guidance
|
||||||
title: Using Guidance, JSON, tools (via outlines)
|
title: Using Guidance, JSON, tools
|
||||||
- local: basic_tutorials/visual_language_models
|
- local: basic_tutorials/visual_language_models
|
||||||
title: Visual Language Models
|
title: Visual Language Models
|
||||||
title: Tutorials
|
title: Tutorials
|
||||||
@ -46,6 +46,6 @@
|
|||||||
- local: conceptual/speculation
|
- local: conceptual/speculation
|
||||||
title: Speculation (Medusa, ngram)
|
title: Speculation (Medusa, ngram)
|
||||||
- local: conceptual/guidance
|
- local: conceptual/guidance
|
||||||
title: How Guidance Works
|
title: How Guidance Works (via outlines)
|
||||||
|
|
||||||
title: Conceptual Guides
|
title: Conceptual Guides
|
||||||
|
@ -122,7 +122,7 @@ print(response.json())
|
|||||||
|
|
||||||
### JSON Schema Integration
|
### JSON Schema Integration
|
||||||
|
|
||||||
If Pydantic's not your style, go raw with direct JSON Schema integration. This is simliar to the first example but with programmatic control.
|
If Pydantic's not your style, go raw with direct JSON Schema integration. This is similar to the first example but with programmatic control.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import requests
|
import requests
|
||||||
|
@ -23,7 +23,6 @@ However these use cases can span a wide range of applications, such as:
|
|||||||
- provide reliable and consistent output for downstream tasks
|
- provide reliable and consistent output for downstream tasks
|
||||||
- extract data from multimodal inputs
|
- extract data from multimodal inputs
|
||||||
|
|
||||||
|
|
||||||
## How it works?
|
## How it works?
|
||||||
|
|
||||||
Diving into the details, guidance is enabled by including a grammar with a generation request that is compiled, and used to modify the chosen tokens.
|
Diving into the details, guidance is enabled by including a grammar with a generation request that is compiled, and used to modify the chosen tokens.
|
||||||
@ -31,6 +30,18 @@ Diving into the details, guidance is enabled by including a grammar with a gener
|
|||||||
This process can be broken down into the following steps:
|
This process can be broken down into the following steps:
|
||||||
|
|
||||||
1. A request is sent to the backend, it is processed and placed in batch. Processing includes compiling the grammar into a finite state machine and a grammar state.
|
1. A request is sent to the backend, it is processed and placed in batch. Processing includes compiling the grammar into a finite state machine and a grammar state.
|
||||||
|
|
||||||
|
<div class="flex justify-center">
|
||||||
|
<img
|
||||||
|
class="block dark:hidden"
|
||||||
|
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/request-to-batch.gif"
|
||||||
|
/>
|
||||||
|
<img
|
||||||
|
class="hidden dark:block"
|
||||||
|
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/request-to-batch-dark.gif"
|
||||||
|
/>
|
||||||
|
</div>
|
||||||
|
|
||||||
2. The model does a forward pass over the batch. This returns probabilities for each token in the vocabulary for each request in the batch.
|
2. The model does a forward pass over the batch. This returns probabilities for each token in the vocabulary for each request in the batch.
|
||||||
|
|
||||||
3. The process of choosing one of those tokens is called `sampling`. The model samples from the distribution of probabilities to choose the next token. In TGI all of the steps before sampling are called `processor`. Grammars are applied as a processor that masks out tokens that are not allowed by the grammar.
|
3. The process of choosing one of those tokens is called `sampling`. The model samples from the distribution of probabilities to choose the next token. In TGI all of the steps before sampling are called `processor`. Grammars are applied as a processor that masks out tokens that are not allowed by the grammar.
|
||||||
|
Loading…
Reference in New Issue
Block a user