mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-06-14 13:22:07 +00:00
Added index.md and other initial files
This commit is contained in:
parent
dc631b5be5
commit
41bd0e4af1
0
docs/source/basic_tutorials/installation.md
Normal file
0
docs/source/basic_tutorials/installation.md
Normal file
14
docs/source/index.md
Normal file
14
docs/source/index.md
Normal file
@ -0,0 +1,14 @@
|
||||
# Text Generation Inference
|
||||
|
||||
Text-Generation-Inference is, an open-source, purpose-built solution for deploying and serving Large Language Models (LLMs). TGI enables high-performance text generation using Tensor Parallelism and dynamic batching for the most popular open-source LLMs, including StarCoder, BLOOM, GPT-NeoX, Llama, and T5. Text Generation Inference implements optimization for all supported model architectures, including:
|
||||
|
||||
- Tensor Parallelism and custom cuda kernels
|
||||
- Optimized transformers code for inference using flash-attention and Paged Attention on the most popular architectures
|
||||
- Quantization with bitsandbytes or gptq
|
||||
- Continuous batching of incoming requests for increased total throughput
|
||||
- Accelerated weight loading (start-up time) with safetensors
|
||||
- Logits warpers (temperature scaling, topk, repetition penalty ...)
|
||||
- Watermarking with A Watermark for Large Language Models
|
||||
- Stop sequences, Log probabilities
|
||||
- Token streaming using Server-Sent Events (SSE)
|
||||
|
0
docs/source/quicktour.md
Normal file
0
docs/source/quicktour.md
Normal file
0
docs/source/supported_models.md
Normal file
0
docs/source/supported_models.md
Normal file
Loading…
Reference in New Issue
Block a user