Update docs/source/conceptual/paged_attention.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
This commit is contained in:
Merve Noyan 2023-09-07 14:46:03 +02:00 committed by GitHub
parent 2ec5436f9c
commit 5ec7b1a2af
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -1,4 +1,4 @@
# Paged Attention
# PagedAttention
LLMs struggle with memory limitations during generation. In the decoding part of generation, all input tokens generated keys and values are stored in GPU memory, also referred to as _KV cache_. KV cache is exhaustive for memory, which causes inefficiencies in LLM serving.