text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-09 15:05:24 +00:00

History

Morgan Funtowicz fcbf2fc1ac fix envvar CARGO_CFG_TARGET_ARCH set at runtime vs compile time		2024-07-25 10:36:55 +00:00
..
cmake	update TensorRT-LLM to latest version	2024-07-23 22:13:02 +00:00
include	refactor the compute capabilities detection along with num gpus	2024-07-23 22:12:42 +00:00
lib	refactor the compute capabilities detection along with num gpus	2024-07-23 22:12:42 +00:00
scripts	update TensorRT install script to latest	2024-07-23 22:23:30 +00:00
src	make sure executor_worker is provided	2024-07-19 11:57:10 +00:00
tests	First version loading engines and making it ready for inference	2024-07-03 21:12:24 +00:00
build.rs	fix envvar CARGO_CFG_TARGET_ARCH set at runtime vs compile time	2024-07-25 10:36:55 +00:00
Cargo.toml	align all the linker search dependency	2024-07-22 14:14:57 +00:00
CMakeLists.txt	install to decoder_attention target	2024-07-25 10:21:54 +00:00
Dockerfile	update TensorRT-LLM to latest version	2024-07-23 22:13:02 +00:00
README.md	adding missing ld_library_path for cuda stubs in Dockerfile	2024-07-22 15:16:39 +00:00

README.md

sequenceDiagram
    TensorRtLlmBackend -->> TensorRtLlmBackendImpl: New thread which instantiates actual backend impl
    TensorRtLlmBackendImpl -->> TensorRtLlmBackendImpl.Receiver: Awaits incoming request sent throught the queue