Commit Graph

36 Commits

Author SHA1 Message Date
Morgan Funtowicz
9d659f1e23 feat(backend): add missing temperature parameter 2024-11-28 16:55:17 +01:00
Morgan Funtowicz
929a2fc718 feat(backend): add some test to the backend for core allocation 2024-11-28 14:53:46 +01:00
Morgan Funtowicz
298367cdfd feat(backend): fix when num_cores_per_instance is equals to zero with the size of the generated core allocation 2024-11-28 14:53:35 +01:00
Morgan Funtowicz
274cfce435 feat(backend): remove core overriding in the Rust backend 2024-11-28 11:40:52 +01:00
Morgan Funtowicz
862a519fdd misc(doc): rust documentation 2024-11-22 15:35:55 +01:00
Morgan Funtowicz
2d9465d181 misc(backend): allow rebinding numa core affinity 2024-11-22 14:02:58 +01:00
Morgan Funtowicz
5a85661661 feat(backend): rely on multi consumer queue to scheduler workers 2024-11-22 13:32:56 +01:00
Morgan Funtowicz
84eead219a feat(backend): correctly setup llama_context providing n_threads and n_ubatch 2024-11-21 21:43:50 +01:00
Morgan Funtowicz
50c376612c feat(backend): bind thread and memory affinity for thread 2024-11-21 13:52:38 +01:00
Morgan Funtowicz
5335bf973b feat(backend): multistream inference on CPU 2024-11-21 00:03:05 +01:00
Morgan Funtowicz
6f059c4b5d feat(backend): wrap Arc tokenizer to avoid duplicating 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
57b215467b feat(backend): simplify Rust callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
86d30aea43 feat(backend): simplify overall cpp structure 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
26d0266cec feat(backend): handle all the tokenization failure and send back to the client 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
7eec0f704f chore(backend): minor fixes mostly format 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
52208f5b78 misc(backend): decrease log verbosity in callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
1149186794 feat(backend): expose tokenizer to the GenerationContext to decode token 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
1473259f84 feat(backend): add early stopping criteria from TGI stream callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
958c72a44a misc(ffi): remove unused ffi mapping 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
5b7a951389 feat(backend): refactor the callback to handle intermediate and end inference message 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
05ff551950 feat(backend): add number of generated tokens in the callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
188442f67d misc(lint): make clippy happier 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
86a2ae6ba2 chore: unsued variables 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
3e82f14f57 feat(backend): somewhat generates the final infer response 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
b50dcddbb8 feat(backend): avoid dropping the boxed stream at the end of the callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
612f2f939f feat(backend): bind incoming request to the server 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
d4aee42fd8 feat(backend): add logit parameter in the callback fn 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
f39edc72ff feat(backend): add mapping for ignore_eos_token stopping criteria 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
d52b4c4978 feat(backend): full rework of the backend internal to safer c++ 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
611590440d misc(offline): expose more parameters for generate 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
a316c53255 feat(llamacpp): expose number of threads for the backend when constructing the model 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
e4d803c94e feat(backend): build and link through build.rs 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
355d8a55b4 feat(backend): wip Rust binding 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
fa89d1e613 misc(cmake): wut 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
52d57dca79 feat(llamacpp): initial end2end build 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
aa1fcba59f feat(llamacpp): initial commit
# Conflicts:
#	Cargo.lock
2024-11-14 08:42:01 +01:00