text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-10 07:25:23 +00:00

History

Morgan Funtowicz 09bcca6a97 update build.rs to link to cuda 12.5		2024-07-24 07:50:26 +00:00
..
cmake	update TensorRT-LLM to latest version	2024-07-23 22:13:02 +00:00
include	refactor the compute capabilities detection along with num gpus	2024-07-23 22:12:42 +00:00
lib	refactor the compute capabilities detection along with num gpus	2024-07-23 22:12:42 +00:00
scripts	update TensorRT install script to latest	2024-07-23 22:23:30 +00:00
src	make sure executor_worker is provided	2024-07-19 11:57:10 +00:00
tests	First version loading engines and making it ready for inference	2024-07-03 21:12:24 +00:00
build.rs	update build.rs to link to cuda 12.5	2024-07-24 07:50:26 +00:00
Cargo.toml	align all the linker search dependency	2024-07-22 14:14:57 +00:00
CMakeLists.txt	refactor the compute capabilities detection along with num gpus	2024-07-23 22:12:42 +00:00
Dockerfile	update TensorRT-LLM to latest version	2024-07-23 22:13:02 +00:00
README.md	adding missing ld_library_path for cuda stubs in Dockerfile	2024-07-22 15:16:39 +00:00

README.md

sequenceDiagram
    TensorRtLlmBackend -->> TensorRtLlmBackendImpl: New thread which instantiates actual backend impl
    TensorRtLlmBackendImpl -->> TensorRtLlmBackendImpl.Receiver: Awaits incoming request sent throught the queue