mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-27 04:52:07 +00:00
PR for conceptual guide on flash attention. I will add more info unless I'm told otherwise. --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com> Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
27 lines
737 B
YAML
27 lines
737 B
YAML
- sections:
|
|
- local: index
|
|
title: Text Generation Inference
|
|
- local: quicktour
|
|
title: Quick Tour
|
|
- local: installation
|
|
title: Installation
|
|
- local: supported_models
|
|
title: Supported Models and Hardware
|
|
title: Getting started
|
|
- sections:
|
|
- local: basic_tutorials/consuming_tgi
|
|
title: Consuming TGI
|
|
- local: basic_tutorials/preparing_model
|
|
title: Preparing Model for Serving
|
|
- local: basic_tutorials/gated_model_access
|
|
title: Serving Private & Gated Models
|
|
- local: basic_tutorials/using_cli
|
|
title: Using TGI CLI
|
|
title: Tutorials
|
|
- sections:
|
|
- local: conceptual/streaming
|
|
title: Streaming
|
|
- local: conceptual/flash_attention
|
|
title: Flash Attention
|
|
title: Conceptual Guides
|