mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-19 22:02:06 +00:00
This work in progress PR begins to add support for tools. Tools relies on grammar support and still has some unsolved challenges. Opening the PR for visibility and feedback
43 lines
1.2 KiB
YAML
43 lines
1.2 KiB
YAML
- sections:
|
|
- local: index
|
|
title: Text Generation Inference
|
|
- local: quicktour
|
|
title: Quick Tour
|
|
- local: installation
|
|
title: Installation
|
|
- local: supported_models
|
|
title: Supported Models and Hardware
|
|
- local: messages_api
|
|
title: Messages API
|
|
- local: guidance
|
|
title: Guidance
|
|
title: Getting started
|
|
- sections:
|
|
- local: basic_tutorials/consuming_tgi
|
|
title: Consuming TGI
|
|
- local: basic_tutorials/preparing_model
|
|
title: Preparing Model for Serving
|
|
- local: basic_tutorials/gated_model_access
|
|
title: Serving Private & Gated Models
|
|
- local: basic_tutorials/using_cli
|
|
title: Using TGI CLI
|
|
- local: basic_tutorials/launcher
|
|
title: All TGI CLI options
|
|
- local: basic_tutorials/non_core_models
|
|
title: Non-core Model Serving
|
|
title: Tutorials
|
|
- sections:
|
|
- local: conceptual/streaming
|
|
title: Streaming
|
|
- local: conceptual/quantization
|
|
title: Quantization
|
|
- local: conceptual/tensor_parallelism
|
|
title: Tensor Parallelism
|
|
- local: conceptual/paged_attention
|
|
title: PagedAttention
|
|
- local: conceptual/safetensors
|
|
title: Safetensors
|
|
- local: conceptual/flash_attention
|
|
title: Flash Attention
|
|
title: Conceptual Guides
|