text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-09-09 03:14:53 +00:00

History

Funtowicz Morgan 856709d5c3 [Backend] Bump TRTLLM to v.0.17.0 (#2991 ) * backend(trtllm): bump TRTLLM to v.0.17.0 * backend(trtllm): forget to bump dockerfile * backend(trtllm): use arg instead of env * backend(trtllm): use correct library reference decoder_attention_src * backend(trtllm): link against decoder_attention_{0\|1} * backend(trtllm): build against gcc-14 with cuda12.8 * backend(trtllm): use return value optimization flag as as error if available * backend(trtllm): make sure we escalade all warnings as errors on the backend impl in debug mode * backend(trtllm): link against CUDA 12.8	2025-02-06 16:45:03 +01:00
..
install_tensorrt.sh	[Backend] Bump TRTLLM to v.0.17.0 (#2991 )	2025-02-06 16:45:03 +01:00
setup_sccache.py	Run `pre-commit run --all-files` to fix CI (#2933 )	2025-01-21 17:33:33 +01:00

Funtowicz Morgan 856709d5c3

[Backend] Bump TRTLLM to v.0.17.0 (#2991 )

* backend(trtllm): bump TRTLLM to v.0.17.0

* backend(trtllm): forget to bump dockerfile

* backend(trtllm): use arg instead of env

* backend(trtllm): use correct library reference decoder_attention_src

* backend(trtllm): link against decoder_attention_{0|1}

* backend(trtllm): build against gcc-14 with cuda12.8

* backend(trtllm): use return value optimization flag as as error if available

* backend(trtllm): make sure we escalade all warnings as errors on the backend impl in debug mode

* backend(trtllm): link against CUDA 12.8

2025-02-06 16:45:03 +01:00

install_tensorrt.sh [Backend] Bump TRTLLM to v.0.17.0 (#2991 ) 2025-02-06 16:45:03 +01:00

setup_sccache.py Run pre-commit run --all-files to fix CI (#2933 ) 2025-01-21 17:33:33 +01:00