Adrien Gallouët
|
094975c3a8
|
Update the llamacpp backend (#3022)
* Build faster
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Make --model-gguf optional
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Bump llama.cpp
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Enable mmap, offload_kqv & flash_attention by default
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Update doc
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Better error message
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Update doc
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Update installed packages
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Save gguf in models/MODEL_ID/model.gguf
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Fix build with Mach-O
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Quantize without llama-quantize
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Bump llama.cpp and switch to ggml-org
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Remove make-gguf.sh
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Update Cargo.lock
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Support HF_HUB_USER_AGENT_ORIGIN
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Bump llama.cpp
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
* Add --build-arg llamacpp_native & llamacpp_cpu_arm_arch
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
---------
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-03-11 09:19:01 +01:00 |
|