Update doc

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-09-11 04:14:52 +00:00 · 2025-02-07 16:48:28 +00:00 · 2025-02-07 16:48:28 +00:00 · d96a77705d
commit d96a77705d
parent b77d05d3af
1 changed files with 2 additions and 0 deletions
--- a/docs/source/backends/llamacpp.md
+++ b/docs/source/backends/llamacpp.md
@ -101,8 +101,10 @@ The table below summarizes key options:
 | `--split-mode`                      | Split the model across multiple GPUs                                   |
 | `--defrag-threshold`                | Defragment the KV cache if holes/size > threshold                      |
 | `--numa`                            | Enable NUMA optimizations                                              |
 | `--use-mmap`                        | Use memory mapping for the model                                       |
 | `--use-mlock`                       | Use memory locking to prevent swapping                                 |
 | `--offload-kqv`                     | Enable offloading of KQV operations to the GPU                         |
 | `--flash-attention`                 | Enable flash attention for faster inference                            |
 | `--type-k`                          | Data type used for K cache                                             |
 | `--type-v`                          | Data type used for V cache                                             |
 | `--validation-workers`              | Number of tokenizer workers used for payload validation and truncation |