Commit Graph

1152 Commits

Author SHA1 Message Date
Morgan Funtowicz
a6ac2741a3 chore(trtllm): validate there are enough GPus on the system for the desired model 2024-10-22 09:52:05 +02:00
Morgan Funtowicz
848b8ad554 chore(trtllm): minor refactoring 2024-10-22 09:52:05 +02:00
Morgan Funtowicz
60a08a283d chore(trtllm): use GetParallelConfig 2024-10-22 09:52:05 +02:00
Morgan Funtowicz
d5c8bdc53b chore(trtllm): define a macro for SizeType cast 2024-10-22 09:52:05 +02:00
Morgan Funtowicz
7217cafadb chore(trtllm): create specific parallelconfig factory and logging init methods 2024-10-22 09:52:05 +02:00
Morgan Funtowicz
421a17544e feat(trtllm): add stop words handling
# Conflicts:
#	backends/trtllm/lib/backend.cpp
2024-10-22 09:52:05 +02:00
Morgan Funtowicz
c1a43a6c3e chore(ffi):formatting 2024-10-22 09:52:05 +02:00
Morgan Funtowicz
9ac26ed717 feat(post_processing): max_new_tokens is const evaluated now 2024-10-22 09:52:05 +02:00
Morgan Funtowicz
cdac4b0058 chore(looper): cleanup a bit more 2024-10-22 09:52:05 +02:00
Morgan Funtowicz
04c6f51258 feat(trtllm): rewrite health to not account for current state 2024-10-22 09:52:05 +02:00
Morgan Funtowicz
18b473b019 chore(router): add python dependency 2024-10-22 09:51:50 +02:00
Morgan Funtowicz
d73401ac73 chore(rebase): fix invalid references 2024-10-21 21:44:28 +02:00
Morgan Funtowicz
f5b9ee368a Revert "chore(trtllm): remove unused method"
This reverts commit 31747163
2024-10-21 17:03:35 +02:00
Morgan Funtowicz
8d1c3c8ad4 feat(trtllm): do not tokenize twice 2024-10-21 15:06:54 +02:00
Morgan Funtowicz
1a3da05f34 misc(router): remove SchedulingError 2024-10-21 14:57:19 +02:00
Morgan Funtowicz
e6da212431 feat(trtllm): cache maxNumTokens to avoid calling JSON everytime 2024-10-21 14:51:58 +02:00
Morgan Funtowicz
31747163e7 chore(trtllm): remove unused method 2024-10-21 14:10:23 +02:00
Morgan Funtowicz
fb00f985ae chore(trtllm): post-rebase commit 2024-10-21 12:31:24 +02:00
Morgan Funtowicz
85c03e33a9 chore(trtllm): fmt 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
e3bce407be chore(trtllm): disable tokenizer parallelism by default 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
62f33d7ecd chore(trtllm): move dockerfile to right place 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
6687c06a21 feat(looper): minor optimizations to avoid growing too much the containers 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
027756c52d chore(cmake): download timestamp should be before URL 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
629153b44b feat(looper): check engine and executorWorker paths exist before creating the backend 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
f20ec28891 chore(cmake): use correct policy for download_timestamp 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
819c953771 misc(cuda): require 12.6 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
dd94ccc989 (fix): ore fixes for Dockerfile 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
f9f10a6636 (misc): improve trtllm download script robustness 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
0c3ba932cc (misc): disable logging in release mode 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
437c2aa142 (misc): update dependency in trtllm dockerfile 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
cb69c9a967 (misc): update dependency in trtllm dockerfile 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
c8a99af6c9 (fix): do not recreate the stateful hashmap at every it 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
eb13d8d1f3 (misc): increase verbosity of spdlog 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
ce0cd1fce8 (misc): build with trtllm 0.13.0 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
188e4dc64f (misc: build for sm_{75,80,86,89,90} by default 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
544c9d9dba (fix): HOPPER_SM_MAJOR is 9 not 8 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
213acc6e34 (misc) move to latest trtllm 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
507ff66692 (misc) rerun-if-changed all the cmake modules 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
b242f45c04 (misc) delete backend.rs 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
984ae9798f (post) impl postprocessing 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
fa63db0d07 (scheduler) rework submit/pull logic 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
42ccf4e77c (misc) no need to move for uint32_t items 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
b41875c139 (misc) simplify [make_]move_iterator by using c++20 type inference 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
0f50539b77 (Dockerfile.trtllm) delete for now 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
b1846fb4e6 (backend) refactor & cleanup 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
483f172938 (ffi) do not use reference capture in lambda as we are not capturing anything 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
3d0e90b631 (ffi) missing namespace for tle::Response 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
8e648ce425 (ffi) fix usage of wrong vector constructor making a capacity fill call 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
dddc9a44bd (build) fetchcontent use archives instead of git 2024-10-21 10:00:27 +02:00
Morgan Funtowicz
089c5fe668 (server) forward auth_token to server::run 2024-10-21 10:00:27 +02:00