Commit Graph

33 Commits

Author SHA1 Message Date
Morgan Funtowicz
9d659f1e23 feat(backend): add missing temperature parameter 2024-11-28 16:55:17 +01:00
Morgan Funtowicz
df72c56b5b feat(backend): add guard in case top_k = 0 2024-11-28 16:30:20 +01:00
Morgan Funtowicz
8e89793514 feat(backend): use the new batch api from llama 2024-11-28 14:52:48 +01:00
Morgan Funtowicz
2d9465d181 misc(backend): allow rebinding numa core affinity 2024-11-22 14:02:58 +01:00
Morgan Funtowicz
84eead219a feat(backend): correctly setup llama_context providing n_threads and n_ubatch 2024-11-21 21:43:50 +01:00
Morgan Funtowicz
5335bf973b feat(backend): multistream inference on CPU 2024-11-21 00:03:05 +01:00
Morgan Funtowicz
488ba93898 feat(backend): fix invalid reference to context in release mode 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
7e2890fe2c feat(backend): remove unused function 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
86d30aea43 feat(backend): simplify overall cpp structure 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
cf17928f83 misc(cmake): remove dependency on fmt 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
7eec0f704f chore(backend): minor fixes mostly format 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
1473259f84 feat(backend): add early stopping criteria from TGI stream callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
5b7a951389 feat(backend): refactor the callback to handle intermediate and end inference message 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
11c593dc69 feat(backend): make eog clearer on c++ side 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
06424aa9ff feat(backend): correctly handle the max_new_tokens case for is_eos 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
05ff551950 feat(backend): add number of generated tokens in the callback 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
7b0a56f40f feat(backend): fix memory leaking on llama_sampler when the decode ends 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
bd8f0f15e1 feat(backend): fix invalid reference to ctx instead of context in release build 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
d4aee42fd8 feat(backend): add logit parameter in the callback fn 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
f39edc72ff feat(backend): add mapping for ignore_eos_token stopping criteria 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
d52b4c4978 feat(backend): full rework of the backend internal to safer c++ 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
b98c635781 feat(backend): entirely rewrite backend 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
0c1dd0ed2b feat(llamacpp): wip explosion 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
a316c53255 feat(llamacpp): expose number of threads for the backend when constructing the model 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
e4d803c94e feat(backend): build and link through build.rs 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
f9c248657d chore(backend): minor formatting 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
37faeb34b2 feat(backend): expose frequency and repetition penalties 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
d4b5be10f9 feat(backend): minor refactor 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
45d5a6a8c5 feat(backend): add some initial decoding steps 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
0911076320 feat(backend): correctly load llama.cpp model from llama api and not gpt2 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
05ad684676 feat(llamacpp): enable cuda 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
52d57dca79 feat(llamacpp): initial end2end build 2024-11-14 08:42:01 +01:00
Morgan Funtowicz
aa1fcba59f feat(llamacpp): initial commit
# Conflicts:
#	Cargo.lock
2024-11-14 08:42:01 +01:00