From 9973f4041cd92681e70dea7cb2468ba2b18fcb8b Mon Sep 17 00:00:00 2001 From: Merve Noyan Date: Thu, 7 Sep 2023 14:46:39 +0200 Subject: [PATCH] Update docs/source/conceptual/paged_attention.md Co-authored-by: Pedro Cuenca --- docs/source/conceptual/paged_attention.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/conceptual/paged_attention.md b/docs/source/conceptual/paged_attention.md index f5062767..2c9885b9 100644 --- a/docs/source/conceptual/paged_attention.md +++ b/docs/source/conceptual/paged_attention.md @@ -6,4 +6,4 @@ PagedAttention addresses the memory waste by partitioning the KV cache into bloc The use of a lookup table to access the memory blocks can also help with KV sharing across multiple generations. This is helpful for techniques such as _parallel sampling_, where multiple outputs are generated simultaneously for the same prompt. In this case, the cached KV blocks can be shared among the generations. -You can learn more about PagedAttention by reading the documentation [here](https://vllm.ai/). +TGI's PagedAttention implementation leverages the custom cuda kernels developed by the [vLLM Project](https://github.com/vllm-project/vllm). You can learn more about this technique in the [project's page](https://vllm.ai/).