Commit Graph

21 Commits

Author SHA1 Message Date
Morgan Funtowicz
2d9465d181 misc(backend): allow rebinding numa core affinity 2024-11-22 14:02:58 +01:00
Morgan Funtowicz
84eead219a feat(backend): correctly setup llama_context providing n_threads and n_ubatch 2024-11-21 21:43:50 +01:00
Morgan Funtowicz
50c376612c feat(backend): bind thread and memory affinity for thread 2024-11-21 13:52:38 +01:00
Morgan Funtowicz
5335bf973b feat(backend): multistream inference on CPU 2024-11-21 00:03:05 +01:00
Morgan Funtowicz
02cd6fe427 chore(backend): minor improvements 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
363d5e45de feat(backend): use std::ranges to map uint32_t to llama_token 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
6915fa3441 feat(backend): remove reinterpret_cast converting from uint32_t to llama_token(int32_t) 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
86d30aea43 feat(backend): simplify overall cpp structure 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
a1154b17ec feat(backend): avoid copy constructor 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
588421833c misc(backend): missing header <variant> 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
1473259f84 feat(backend): add early stopping criteria from TGI stream callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
5b7a951389 feat(backend): refactor the callback to handle intermediate and end inference message 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
05ff551950 feat(backend): add number of generated tokens in the callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
31d9254776 feat(backend): remove static from inner_fw visitor as it leads to invalid memory locations 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
b50dcddbb8 feat(backend): avoid dropping the boxed stream at the end of the callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
d4aee42fd8 feat(backend): add logit parameter in the callback fn 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
d52b4c4978 feat(backend): full rework of the backend internal to safer c++ 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
0c1dd0ed2b feat(llamacpp): wip explosion 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
a316c53255 feat(llamacpp): expose number of threads for the backend when constructing the model 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
e4d803c94e feat(backend): build and link through build.rs 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
355d8a55b4 feat(backend): wip Rust binding 2024-11-14 08:42:01 +01:00