mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-24 00:12:08 +00:00
This PR adds a new page to the docs that describes the Messages API and how to use it. Additionally this page will contain cloud provider specific information for enabling and using this feature. This PR includes a SageMaker example/information.
41 lines
1.2 KiB
YAML
41 lines
1.2 KiB
YAML
- sections:
|
|
- local: index
|
|
title: Text Generation Inference
|
|
- local: quicktour
|
|
title: Quick Tour
|
|
- local: installation
|
|
title: Installation
|
|
- local: supported_models
|
|
title: Supported Models and Hardware
|
|
- local: messages_api
|
|
title: Messages API
|
|
title: Getting started
|
|
- sections:
|
|
- local: basic_tutorials/consuming_tgi
|
|
title: Consuming TGI
|
|
- local: basic_tutorials/preparing_model
|
|
title: Preparing Model for Serving
|
|
- local: basic_tutorials/gated_model_access
|
|
title: Serving Private & Gated Models
|
|
- local: basic_tutorials/using_cli
|
|
title: Using TGI CLI
|
|
- local: basic_tutorials/launcher
|
|
title: All TGI CLI options
|
|
- local: basic_tutorials/non_core_models
|
|
title: Non-core Model Serving
|
|
title: Tutorials
|
|
- sections:
|
|
- local: conceptual/streaming
|
|
title: Streaming
|
|
- local: conceptual/quantization
|
|
title: Quantization
|
|
- local: conceptual/tensor_parallelism
|
|
title: Tensor Parallelism
|
|
- local: conceptual/paged_attention
|
|
title: PagedAttention
|
|
- local: conceptual/safetensors
|
|
title: Safetensors
|
|
- local: conceptual/flash_attention
|
|
title: Flash Attention
|
|
title: Conceptual Guides
|