Adrien Gallouët
|
dbee804129
|
Simplify batching logic
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-05 10:12:39 +00:00 |
|
Adrien Gallouët
|
d3a772a8dd
|
Update args
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-05 10:10:38 +00:00 |
|
Adrien Gallouët
|
df2a4fbb8a
|
Update Dockerfile_llamacpp
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:59 +00:00 |
|
Adrien Gallouët
|
d883109df6
|
Disable graceful shutdown in debug mode
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:59 +00:00 |
|
Adrien Gallouët
|
38b33e9698
|
Add --type-v & --type-k
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:59 +00:00 |
|
Adrien Gallouët
|
bfb8e03e9f
|
Add specific args for batch
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:59 +00:00 |
|
Adrien Gallouët
|
ea28332bb3
|
Cleanup
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:59 +00:00 |
|
Adrien Gallouët
|
104a968d01
|
Remove warmup
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:59 +00:00 |
|
Adrien Gallouët
|
8ed362d03a
|
Clear request cache after completion
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:59 +00:00 |
|
Adrien Gallouët
|
c8505fb300
|
Auto-detect n_threads when not provided
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:59 +00:00 |
|
Adrien Gallouët
|
27534d8ee4
|
Fix seq iterations
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:59 +00:00 |
|
Adrien Gallouët
|
96434a1e7e
|
Fix batching
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:59 +00:00 |
|
Adrien Gallouët
|
2a51e415ff
|
Output real logprobs
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
161280f313
|
Only export the latest logits
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Morgan Funtowicz
|
960c12bd6e
|
backend(llama): add CUDA Dockerfile_llamacpp for now
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
f38c34aeb7
|
Fix batch_pos
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
e88a527fcf
|
Add --offload-kqv
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
ae5bb789c2
|
Enable flash attention by default
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
3f199134f0
|
Fix args
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
7a3ed4171e
|
Add --numa
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
390f0ec061
|
Cleanup
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
d6ded897a8
|
Add a stupid batch mechanism
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
e07835c5b5
|
Add --defrag-threshold
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
f388747985
|
Add GPU args
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
8d2dfdf668
|
Handle ctx args & fix sampling
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
a7b4b04cb5
|
Add some input validation checks
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
e7facf692f
|
Handle max_batch_size
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
3eb4823f3e
|
Use max_batch_total_tokens
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
bd0cc9905c
|
Get rid of llama_batch_get_one()
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:58 +00:00 |
|
Adrien Gallouët
|
95e221eece
|
Add llamacpp backend
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
|
2025-02-04 13:32:56 +00:00 |
|