From d96a77705dd314e5354af80547a190181ed6a385 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Adrien=20Gallou=C3=ABt?= <angt@huggingface.co>
Date: Fri, 7 Feb 2025 16:48:28 +0000
Subject: [PATCH] Update doc
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
---
 docs/source/backends/llamacpp.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/source/backends/llamacpp.md b/docs/source/backends/llamacpp.md
index f5aeb52c..dd4ef7b7 100644
--- a/docs/source/backends/llamacpp.md
+++ b/docs/source/backends/llamacpp.md
@@ -101,8 +101,10 @@ The table below summarizes key options:
 | `--split-mode`                      | Split the model across multiple GPUs                                   |
 | `--defrag-threshold`                | Defragment the KV cache if holes/size > threshold                      |
 | `--numa`                            | Enable NUMA optimizations                                              |
+| `--use-mmap`                        | Use memory mapping for the model                                       |
 | `--use-mlock`                       | Use memory locking to prevent swapping                                 |
 | `--offload-kqv`                     | Enable offloading of KQV operations to the GPU                         |
+| `--flash-attention`                 | Enable flash attention for faster inference                            |
 | `--type-k`                          | Data type used for K cache                                             |
 | `--type-v`                          | Data type used for V cache                                             |
 | `--validation-workers`              | Number of tokenizer workers used for payload validation and truncation |