Commit Graph

14 Commits

Author SHA1 Message Date
Adrien Gallouët
e88a527fcf
Add --offload-kqv
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
ae5bb789c2
Enable flash attention by default
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
3f199134f0
Fix args
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
7a3ed4171e
Add --numa
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
390f0ec061
Cleanup
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
d6ded897a8
Add a stupid batch mechanism
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
e07835c5b5
Add --defrag-threshold
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
f388747985
Add GPU args
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
8d2dfdf668
Handle ctx args & fix sampling
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
a7b4b04cb5
Add some input validation checks
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
e7facf692f
Handle max_batch_size
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
3eb4823f3e
Use max_batch_total_tokens
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
bd0cc9905c
Get rid of llama_batch_get_one()
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
95e221eece
Add llamacpp backend
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:56 +00:00