Update docs/source/conceptual/paged_attention.md

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2025-09-11 04:14:52 +00:00 · 2023-09-07 14:46:03 +02:00 · 2023-09-07 14:46:03 +02:00 · 5ec7b1a2af
commit 5ec7b1a2af
parent 2ec5436f9c
1 changed files with 1 additions and 1 deletions
--- a/docs/source/conceptual/paged_attention.md
+++ b/docs/source/conceptual/paged_attention.md
@ -1,4 +1,4 @@
-# Paged Attention
+# PagedAttention
 LLMs struggle with memory limitations during generation. In the decoding part of generation, all input tokens generated keys and values are stored in GPU memory, also referred to as _KV cache_. KV cache is exhaustive for memory, which causes inefficiencies in LLM serving.
`@ -1,4 +1,4 @@`
	`# Paged Attention`	`# PagedAttention`

	`LLMs struggle with memory limitations during generation. In the decoding part of generation, all input tokens generated keys and values are stored in GPU memory, also referred to as _KV cache_. KV cache is exhaustive for memory, which causes inefficiencies in LLM serving.`	`LLMs struggle with memory limitations during generation. In the decoding part of generation, all input tokens generated keys and values are stored in GPU memory, also referred to as _KV cache_. KV cache is exhaustive for memory, which causes inefficiencies in LLM serving.`