Morgan Funtowicz
7eec0f704f
chore(backend): minor fixes mostly format
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
a1154b17ec
feat(backend): avoid copy constructor
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
588421833c
misc(backend): missing header <variant>
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
62dba1a878
misc(cmake): use url deps and not git repo
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
52208f5b78
misc(backend): decrease log verbosity in callback
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
1149186794
feat(backend): expose tokenizer to the GenerationContext to decode token
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
1473259f84
feat(backend): add early stopping criteria from TGI stream callback
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
958c72a44a
misc(ffi): remove unused ffi mapping
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
5b7a951389
feat(backend): refactor the callback to handle intermediate and end inference message
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
11c593dc69
feat(backend): make eog clearer on c++ side
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
06424aa9ff
feat(backend): correctly handle the max_new_tokens case for is_eos
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
05ff551950
feat(backend): add number of generated tokens in the callback
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
188442f67d
misc(lint): make clippy happier
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
31d9254776
feat(backend): remove static from inner_fw visitor as it leads to invalid memory locations
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
7b0a56f40f
feat(backend): fix memory leaking on llama_sampler when the decode ends
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
86a2ae6ba2
chore: unsued variables
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
2cdfed94d9
feat(backend): correctly link to shared fmt and spdlog instead of static
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
bd8f0f15e1
feat(backend): fix invalid reference to ctx instead of context in release build
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
3e82f14f57
feat(backend): somewhat generates the final infer response
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
b50dcddbb8
feat(backend): avoid dropping the boxed stream at the end of the callback
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
612f2f939f
feat(backend): bind incoming request to the server
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
d4aee42fd8
feat(backend): add logit parameter in the callback fn
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
f39edc72ff
feat(backend): add mapping for ignore_eos_token stopping criteria
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
3af2c6837c
misc(offline): match rework
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
d52b4c4978
feat(backend): full rework of the backend internal to safer c++
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
6a5f6b0755
misc(offline): update offline tester
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
b98c635781
feat(backend): entirely rewrite backend
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
611590440d
misc(offline): expose more parameters for generate
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
dbc5b7a0f7
misc(offline): link correctly
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
0c1dd0ed2b
feat(llamacpp): wip explosion
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
a316c53255
feat(llamacpp): expose number of threads for the backend when constructing the model
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
179309b364
misc(build): refactor build type detection in cmake
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
f0859c247f
misc(build): handle different lib destination folder lib/lib64
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
e4d803c94e
feat(backend): build and link through build.rs
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
355d8a55b4
feat(backend): wip Rust binding
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
f9c248657d
chore(backend): minor formatting
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
37faeb34b2
feat(backend): expose frequency and repetition penalties
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
d4b5be10f9
feat(backend): minor refactor
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
92bb113653
feat(backend): use llama_token as TokenId type
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
45d5a6a8c5
feat(backend): add some initial decoding steps
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
098c66920d
feat(backend): tell cmake to build llama-common and link to it
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
0911076320
feat(backend): correctly load llama.cpp model from llama api and not gpt2
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
05ad684676
feat(llamacpp): enable cuda
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
fa89d1e613
misc(cmake): wut
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
e4432d36b1
misc(cmake): add parameter to build specific cuda arch
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
52d57dca79
feat(llamacpp): initial end2end build
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
7d1f8a2bd6
feat(llamacpp): correctly handle CMAKE_BUILD_TYPE for spdlog macros
2024-11-14 08:42:01 +01:00
Morgan Funtowicz
aa1fcba59f
feat(llamacpp): initial commit
...
# Conflicts:
# Cargo.lock
2024-11-14 08:42:01 +01:00
Nicolas Patry
0c9b6cdd76
Choosing input/total tokens automatically based on available VRAM? ( #2673 )
...
* Choosing input/total tokens automatically based on available VRAM?
* Update doc.
* Remove generated files.
* Trying to fix non chunking targets.
* Attempt #2
* fix.
* QuantLinear is rocm compatible.
* Much simpler logic after the overhead.
* Updating logic + non flash.
* Revert doc text.
* Simple updates.
* Fix integration mt0 (transformers update).
2024-10-28 04:59:49 +01:00
Funtowicz Morgan
ba5fc7d922
Add support for stop words in TRTLLM ( #2678 )
...
* feat(trtllm): rewrite health to not account for current state
* chore(looper): cleanup a bit more
* feat(post_processing): max_new_tokens is const evaluated now
* chore(ffi):formatting
* feat(trtllm): add stop words handling
# Conflicts:
# backends/trtllm/lib/backend.cpp
* chore(trtllm): create specific parallelconfig factory and logging init methods
* chore(trtllm): define a macro for SizeType cast
* chore(trtllm): use GetParallelConfig
* chore(trtllm): minor refactoring
* chore(trtllm): validate there are enough GPus on the system for the desired model
* chore(trtllm): ensure max throughput scheduling policy is selected
* chore(trtllm): minor fix
* chore(router): minor refactorings
* feat(docker): build with-slurm ompi
* feat(docker): add python3.10 dev to runtime deps
* chore(docker): add mpi to ld_library_path
* chore(docker): install transformers
* feat(trtllm): detect stop_words from generation_config.json
2024-10-25 10:58:34 +02:00