Commit Graph

1210 Commits

Author SHA1 Message Date
Morgan Funtowicz
c9f6c3a8f7 feat(backend): better map exception throw on C++ side 2024-11-29 23:34:16 +01:00
Morgan Funtowicz
db41776a0e feat(backend): add mimalloc memory allocator to the container 2024-11-29 16:26:00 +01:00
Morgan Funtowicz
f5c4cee364 feat(backend): correctly link to all libraries 2024-11-29 16:25:12 +01:00
Hugo Larcher
59b0ef3018
feat: Fix Cmakelist to allow building on Darwin platform (#2785)
* feat: Fix Cmakelist to allow building on Darwin platform
* fix: Fix tokenizer in llama.cpp Dockerfile
2024-11-29 00:31:36 +01:00
Morgan Funtowicz
b10eaab9f3 feat(backend): use new batch API to generate tokens 2024-11-28 23:57:24 +01:00
Morgan Funtowicz
dc6435e3a5 feat(backend): create llama_context_params with default factory 2024-11-28 23:57:13 +01:00
Morgan Funtowicz
b1ebc8f73b feat(backend): update llama.cpp to 4215 2024-11-28 23:56:57 +01:00
Morgan Funtowicz
6c5a75b593 misc(offline): update model creation as std::shared_ptr 2024-11-28 17:45:22 +01:00
Morgan Funtowicz
9d659f1e23 feat(backend): add missing temperature parameter 2024-11-28 16:55:17 +01:00
Morgan Funtowicz
df72c56b5b feat(backend): add guard in case top_k = 0 2024-11-28 16:30:20 +01:00
Morgan Funtowicz
929a2fc718 feat(backend): add some test to the backend for core allocation 2024-11-28 14:53:46 +01:00
Morgan Funtowicz
298367cdfd feat(backend): fix when num_cores_per_instance is equals to zero with the size of the generated core allocation 2024-11-28 14:53:35 +01:00
Morgan Funtowicz
8e89793514 feat(backend): use the new batch api from llama 2024-11-28 14:52:48 +01:00
Morgan Funtowicz
274cfce435 feat(backend): remove core overriding in the Rust backend 2024-11-28 11:40:52 +01:00
Funtowicz Morgan
d918e6a159
Update Dockerfile.llamacpp as per review
Co-authored-by: Hugo Larcher <hugo.larcher@huggingface.co>
2024-11-28 09:53:59 +01:00
Funtowicz Morgan
bbe95ca9e9
Update Dockerfile.llamacpp as per review
Co-authored-by: Hugo Larcher <hugo.larcher@huggingface.co>
2024-11-28 09:53:15 +01:00
Morgan Funtowicz
9025a26cea chore: remove unrelated change to trtllm 2024-11-22 15:42:09 +01:00
Morgan Funtowicz
862a519fdd misc(doc): rust documentation 2024-11-22 15:35:55 +01:00
Morgan Funtowicz
b9c04b9c07 misc(doc): c++ documentation 2024-11-22 15:13:54 +01:00
Morgan Funtowicz
4ee2ee58c9 misc(license): update LICENSE 2024-11-22 14:48:39 +01:00
Morgan Funtowicz
2d9465d181 misc(backend): allow rebinding numa core affinity 2024-11-22 14:02:58 +01:00
Morgan Funtowicz
30ae99631c misc(docker): add numa lib as dependency 2024-11-22 13:34:52 +01:00
Morgan Funtowicz
5a85661661 feat(backend): rely on multi consumer queue to scheduler workers 2024-11-22 13:32:56 +01:00
Morgan Funtowicz
84eead219a feat(backend): correctly setup llama_context providing n_threads and n_ubatch 2024-11-21 21:43:50 +01:00
Morgan Funtowicz
50c376612c feat(backend): bind thread and memory affinity for thread 2024-11-21 13:52:38 +01:00
Morgan Funtowicz
5335bf973b feat(backend): multistream inference on CPU 2024-11-21 00:03:05 +01:00
Morgan Funtowicz
23d2bcf28d misc(build): improve build process 2024-11-14 09:38:13 +01:00
Morgan Funtowicz
70c90ad933 feat(backend): update llamacpp to 4077 2024-11-14 09:04:06 +01:00
Morgan Funtowicz
6f059c4b5d feat(backend): wrap Arc tokenizer to avoid duplicating 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
57b215467b feat(backend): simplify Rust callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
daf1631e09 dockerfile(backend): initial working version of llama.cpp container 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
02cd6fe427 chore(backend): minor improvements 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
363d5e45de feat(backend): use std::ranges to map uint32_t to llama_token 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
488ba93898 feat(backend): fix invalid reference to context in release mode 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
7e2890fe2c feat(backend): remove unused function 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
6915fa3441 feat(backend): remove reinterpret_cast converting from uint32_t to llama_token(int32_t) 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
86d30aea43 feat(backend): simplify overall cpp structure 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
4f5397c414 misc(cmake): use URL base llama.cpp repo 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
cf17928f83 misc(cmake): remove dependency on fmt 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
26d0266cec feat(backend): handle all the tokenization failure and send back to the client 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
20652824d9 feat(dockerfile): build process 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
a7afde41a9 feat(backend): dockerfile 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
7eec0f704f chore(backend): minor fixes mostly format 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
a1154b17ec feat(backend): avoid copy constructor 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
588421833c misc(backend): missing header <variant> 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
62dba1a878 misc(cmake): use url deps and not git repo 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
52208f5b78 misc(backend): decrease log verbosity in callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
1149186794 feat(backend): expose tokenizer to the GenerationContext to decode token 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
1473259f84 feat(backend): add early stopping criteria from TGI stream callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
958c72a44a misc(ffi): remove unused ffi mapping 2024-11-14 08:42:01 +01:00