text-generation-inference/backends/trtllm
2024-07-25 10:36:55 +00:00
..
cmake update TensorRT-LLM to latest version 2024-07-23 22:13:02 +00:00
include refactor the compute capabilities detection along with num gpus 2024-07-23 22:12:42 +00:00
lib refactor the compute capabilities detection along with num gpus 2024-07-23 22:12:42 +00:00
scripts update TensorRT install script to latest 2024-07-23 22:23:30 +00:00
src make sure executor_worker is provided 2024-07-19 11:57:10 +00:00
tests First version loading engines and making it ready for inference 2024-07-03 21:12:24 +00:00
build.rs fix envvar CARGO_CFG_TARGET_ARCH set at runtime vs compile time 2024-07-25 10:36:55 +00:00
Cargo.toml align all the linker search dependency 2024-07-22 14:14:57 +00:00
CMakeLists.txt install to decoder_attention target 2024-07-25 10:21:54 +00:00
Dockerfile update TensorRT-LLM to latest version 2024-07-23 22:13:02 +00:00
README.md adding missing ld_library_path for cuda stubs in Dockerfile 2024-07-22 15:16:39 +00:00

sequenceDiagram
    TensorRtLlmBackend -->> TensorRtLlmBackendImpl: New thread which instantiates actual backend impl
    TensorRtLlmBackendImpl -->> TensorRtLlmBackendImpl.Receiver: Awaits incoming request sent throught the queue