Morgan Funtowicz
|
e983ee5bb8
|
make sure the context is not dropped in the middle of the async decoding.
|
2024-07-17 21:56:50 +00:00 |
|
Morgan Funtowicz
|
9220340ff7
|
compute the number of maximum new tokens for each request independently
|
2024-07-17 13:55:29 +00:00 |
|
Morgan Funtowicz
|
7784a21d48
|
impl RwLock scenario for TensorRtLllmBackend
|
2024-07-16 20:08:10 +00:00 |
|
Morgan Funtowicz
|
b291be64a0
|
impl the rust backend which currently cannot move the actual computation in background thread
|
2024-07-12 19:26:32 +00:00 |
|
Morgan Funtowicz
|
50e9fc89c8
|
working setup of the ffi layer
|
2024-07-11 21:24:32 +00:00 |
|
Morgan Funtowicz
|
f8a1463915
|
Enable end to end CMake build
|
2024-07-03 10:27:53 +02:00 |
|
Morgan Funtowicz
|
47ac5c654d
|
Working FFI call for TGI and TRTLLM backend
|
2024-07-01 15:53:23 +02:00 |
|
Morgan Funtowicz
|
dc402dc9ac
|
Initial setup for CXX binding to TRTLLM
|
2024-06-30 23:37:20 +02:00 |
|