Adrien Gallouët
|
e88a527fcf
|
Add --offload-kqv
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
ae5bb789c2
|
Enable flash attention by default
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
3f199134f0
|
Fix args
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
7a3ed4171e
|
Add --numa
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
390f0ec061
|
Cleanup
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
d6ded897a8
|
Add a stupid batch mechanism
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
e07835c5b5
|
Add --defrag-threshold
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
f388747985
|
Add GPU args
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
8d2dfdf668
|
Handle ctx args & fix sampling
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
a7b4b04cb5
|
Add some input validation checks
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
e7facf692f
|
Handle max_batch_size
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
3eb4823f3e
|
Use max_batch_total_tokens
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
bd0cc9905c
|
Get rid of llama_batch_get_one()
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
95e221eece
|
Add llamacpp backend
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:56 +00:00 |
|