text-generation-inference/docs/source
Adrien Gallouët 094975c3a8
Update the llamacpp backend ()
* Build faster

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Make --model-gguf optional

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Bump llama.cpp

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Enable mmap, offload_kqv & flash_attention by default

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Update doc

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Better error message

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Update doc

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Update installed packages

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Save gguf in models/MODEL_ID/model.gguf

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Fix build with Mach-O

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Quantize without llama-quantize

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Bump llama.cpp and switch to ggml-org

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Remove make-gguf.sh

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Update Cargo.lock

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Support HF_HUB_USER_AGENT_ORIGIN

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Bump llama.cpp

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

* Add --build-arg llamacpp_native & llamacpp_cpu_arm_arch

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

---------

Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-03-11 09:19:01 +01:00
..
backends Update the llamacpp backend () 2025-03-11 09:19:01 +01:00
basic_tutorials Fix a tiny typo in monitoring.md tutorial () 2025-03-04 17:06:26 +01:00
conceptual Preparing for release. () 2025-03-04 16:47:10 +01:00
reference Update --max-batch-total-tokens description () 2025-03-07 14:24:26 +01:00
_toctree.yml Avoid running neuron integration tests twice () 2025-02-26 12:15:01 +01:00
architecture.md Avoid running neuron integration tests twice () 2025-02-26 12:15:01 +01:00
index.md Removing ../ that broke the link () 2024-12-02 05:48:55 +01:00
installation_amd.md Preparing for release. () 2025-03-04 16:47:10 +01:00
installation_gaudi.md MI300 compatibility () 2024-05-17 15:30:47 +02:00
installation_inferentia.md Avoid running neuron integration tests twice () 2025-02-26 12:15:01 +01:00
installation_intel.md Preparing for release. () 2025-03-04 16:47:10 +01:00
installation_nvidia.md Preparing for release. () 2025-03-04 16:47:10 +01:00
installation_tpu.md Fix typo in TPU docs () 2025-01-15 18:32:07 +01:00
installation.md MI300 compatibility () 2024-05-17 15:30:47 +02:00
multi_backend_support.md Avoid running neuron integration tests twice () 2025-02-26 12:15:01 +01:00
quicktour.md Preparing for release. () 2025-03-04 16:47:10 +01:00
supported_models.md feat: add initial qwen2.5-vl model and test () 2025-02-19 12:38:20 +01:00
usage_statistics.md fix: Telemetry () 2025-01-28 10:29:18 +01:00