text-generation-inference/backends/trtllm
2024-07-24 07:50:26 +00:00
..
cmake update TensorRT-LLM to latest version 2024-07-23 22:13:02 +00:00
include refactor the compute capabilities detection along with num gpus 2024-07-23 22:12:42 +00:00
lib refactor the compute capabilities detection along with num gpus 2024-07-23 22:12:42 +00:00
scripts update TensorRT install script to latest 2024-07-23 22:23:30 +00:00
src make sure executor_worker is provided 2024-07-19 11:57:10 +00:00
tests First version loading engines and making it ready for inference 2024-07-03 21:12:24 +00:00
build.rs update build.rs to link to cuda 12.5 2024-07-24 07:50:26 +00:00
Cargo.toml align all the linker search dependency 2024-07-22 14:14:57 +00:00
CMakeLists.txt refactor the compute capabilities detection along with num gpus 2024-07-23 22:12:42 +00:00
Dockerfile update TensorRT-LLM to latest version 2024-07-23 22:13:02 +00:00
README.md adding missing ld_library_path for cuda stubs in Dockerfile 2024-07-22 15:16:39 +00:00

sequenceDiagram
    TensorRtLlmBackend -->> TensorRtLlmBackendImpl: New thread which instantiates actual backend impl
    TensorRtLlmBackendImpl -->> TensorRtLlmBackendImpl.Receiver: Awaits incoming request sent throught the queue