Commit Graph

4 Commits

Author SHA1 Message Date
Funtowicz Morgan
17367438f3
Give TensorRT-LLMa proper CI/CD 😍 (#2886)
* test(ctest) enable address sanitizer

* feat(trtllm): expose finish reason to Rust

* feat(trtllm): fix logits retrieval

* misc(ci): enabe building tensorrt-llm

* misc(ci): update Rust action toolchain

* misc(ci): let's try to build the Dockerfile for trtllm

# Conflicts:
#	Dockerfile_trtllm

* misc(ci): provide mecanism to cache inside container

* misc(ci): export aws creds as output of step

* misc(ci): let's try this way

* misc(ci): again

* misc(ci): again

* misc(ci): add debug profile

* misc(ci): add debug profile

* misc(ci): lets actually use sccache ...

* misc(ci): do not build with ssl enabled

* misc(ci): WAT

* misc(ci): WAT

* misc(ci): WAT

* misc(ci): WAT

* misc(ci): WAT

* misc(backend): test with TGI S3 conf

* misc(backend): test with TGI S3 conf

* misc(backend): once more?

* misc(backend): let's try with GHA

* misc(backend): missing env directive

* misc(backend): make sure to correctly set IS_GHA_BUILD=true in wf

* misc(backend): ok let's debug smtg

* misc(backend): WWWWWWWWWWWWWAAAAAAAA

* misc(backend): kthxbye retry s3

* misc(backend): use session token

* misc(backend): add more info

* misc(backend): lets try 1h30

* misc(backend): lets try 1h30

* misc(backend): increase to 2h

* misc(backend): lets try...

* misc(backend): lets try...

* misc(backend): let's build for ci-runtime

* misc(backend): let's add some more tooling

* misc(backend): add some tags

* misc(backend): disable Werror for now

* misc(backend): added automatic gha detection

* misc(backend): remove leak sanitizer which is included in asan

* misc(backend): forward env

* misc(backend): forward env

* misc(backend): let's try

* misc(backend): let's try

* misc(backend): again

* misc(backend): again

* misc(backend): again

* misc(backend): again

* misc(backend): again

* misc(backend): fix sscache -> sccache

* misc(backend): fix sscache -> sccache

* misc(backend): fix sscache -> sccache

* misc(backend): let's actually cache things now

* misc(backend): let's actually cache things now

* misc(backend): attempt to run the testS?

* misc(backend): attempt to run the tests?

* misc(backend): attempt to run the tests?

* change runner size

* fix: Correctly tag docker images (#2878)

* fix: Correctly tag docker images

* fix: Correctly tag docker images

* misc(llamacpp): maybe?

* misc(llamacpp): maybe?

* misc(llamacpp): maybe?

* misc(ci): gogogo

* misc(ci): gogogo

* misc(ci): gogogo

* misc(ci): gogogo

* misc(ci): gogogo

* misc(ci): gogogo

* misc(ci): go

* misc(ci): go

* misc(ci): go

* misc(ci): use bin folder

* misc(ci): make the wf callable for reuse

* misc(ci): make the wf callable for reuse (bis)

* misc(ci): make the wf callable for reuse (bis)

* misc(ci): give the wf a name

* Create test-trtllm.yml

* Update test-trtllm.yml

* Create build-trtllm2

* Rename build-trtllm2 to 1-build-trtllm2

* Rename test-trtllm.yml to 1-test-trtllm2.yml

* misc(ci): fw secrets

* Update 1-test-trtllm2.yml

* Rename 1-build-trtllm2 to 1-build-trtllm2.yml

* Update 1-test-trtllm2.yml

* misc(ci): use ci-build.yaml as main dispatcher

* Delete .github/workflows/1-test-trtllm2.yml

* Delete .github/workflows/1-build-trtllm2.yml

* misc(ci): rights?

* misc(ci): rights?

* misc(ci): once more?

* misc(ci): once more?

* misc(ci): baby more time?

* misc(ci): baby more time?

* misc(ci): try the permission above again?

* misc(ci): try the permission above again?

* misc(ci): try the permission scoped again?

* misc(ci): install tensorrt_llm_executor_static

* misc(ci): attempt to rebuild with sccache?

* misc(ci):run the tests on GPU instance

* misc(ci): let's actually setup sccache in the build.rs

* misc(ci): reintroduce variables

* misc(ci): enforce sccache

* misc(ci): correct right job name dependency

* misc(ci): detect dev profile for debug

* misc(ci): detect gha build

* misc(ci): detect gha build

* misc(ci): ok debug

* misc(ci): wtf

* misc(ci): wtf2

* misc(ci): wtf3

* misc(ci): use commit HEAD instead of merge commit for image id

* misc(ci): wtfinfini

* misc(ci): wtfinfini

* misc(ci): KAMEHAMEHA

* Merge TRTLLM in standard CI

* misc(ci): remove input machine

* misc(ci): missing id-token for AWS auth

* misc(ci): missing id-token for AWS auth

* misc(ci): missing id-token for AWS auth

* misc(ci): again...

* misc(ci): again...

* misc(ci): again...

* misc(ci): again...

* misc(ci): missing benchmark

* misc(ci): missing backends

* misc(ci): missing launcher

* misc(ci): give everything aws needs

* misc(ci): give everything aws needs

* misc(ci): fix warnings

* misc(ci): attempt to fix sccache not building trtllm

* misc(ci): attempt to fix sccache not building trtllm again

---------

Co-authored-by: Guillaume LEGENDRE <glegendre01@gmail.com>
Co-authored-by: Hugo Larcher <hugo.larcher@huggingface.co>
Co-authored-by: Pauline Bailly-Masson <155966238+paulinebm@users.noreply.github.com>
2025-01-21 10:19:16 +01:00
drbh
a72f339c79
fix: lint backend and doc files (#2850) 2024-12-16 16:12:34 -05:00
Funtowicz Morgan
ea7f4082c4
TensorRT-LLM backend bump to latest version + misc fixes (#2791)
* misc(cmake) update dependencies

* feat(hardware) enable new hardware.hpp and unittests

* test(ctest) enable address sanitizer

* feat(backend): initial rewrite of the backend for simplicity

* feat(backend): remove all the logs from hardware.hpp

* feat(backend): added some logging

* feat(backend): enable compiler warning if support for RVO not applying

* feat(backend): missing return statement

* feat(backend): introduce backend_workspace_t to store precomputed information from the engine folder

* feat(backend): delete previous backend impl

* feat(backend): more impl

* feat(backend): use latest trtllm main version to have g++ >= 13 compatibility

* feat(backend): allow overriding which Python to use

* feat(backend): fix backend_exception_t -> backend_error_t naming

* feat(backend): impl missing generation_step_t as return value of pull_tokens

* feat(backend): make backend_workspace_t::engines_folder constexpr

* feat(backend): fix main.rs retrieving the tokenizer

* feat(backend): add guard to multiple header definitions

* test(backend): add more unittest

* feat(backend): remove constexpr from par

* feat(backend): remove constexpig

* test(backend): more test coverage

* chore(trtllm): update dependency towards 0.15.0

* effectively cancel the request on the executor

* feat(backend) fix moving backend when pulling

* feat(backend): make sure we can easily cancel request on the executor

* feat(backend): fix missing "0" field access

* misc(backend): fix reborrowing Pin<&mut T> as described in the doc https://doc.rust-lang.org/stable/std/pin/struct.Pin.html#method.as_mut

* chore: Add doc and CI for TRTLLM (#2799)

* chore: Add doc and CI for TRTLLM

* chore: Add doc and CI for TRTLLM

* chore: Add doc and CI for TRTLLM

* chore: Add doc and CI for TRTLLM

* doc: Formatting

* misc(backend): indent

---------

Co-authored-by: Hugo Larcher <hugo.larcher@huggingface.co>
2024-12-13 15:50:59 +01:00
Nicolas Patry
2b19d671b4
Rebase TRT-llm (#2331)
* wip

wip

refacto

refacto

Initial setup for CXX binding to TRTLLM

Working FFI call for TGI and TRTLLM backend

Remove unused parameters annd force tokenizer name to be set

Overall build TRTLLM and deps through CMake build system

Enable end to end CMake build

First version loading engines and making it ready for inference

Remembering to check how we can detect support for chunked context

Move to latest TensorRT-LLM version

Specify which default log level to use depending on CMake build type

make leader executor mode working

unconditionally call InitializeBackend on the FFI layer

bind to CUDA::nvml to retrieve compute capabilities at runtime

updated logic and comment to detect cuda compute capabilities

implement the Stream method to send new tokens through a callback

use spdlog release 1.14.1 moving forward

update trtllm to latest version a96cccafcf6365c128f004f779160951f8c0801c

correctly tell cmake to build dependent tensorrt-llm required libraries

create cmake install target to put everything relevant in installation folder

add auth_token CLI argument to provide hf hub authentification token

allow converting huggingface::tokenizers error to TensorRtLlmBackendError

use correct include for spdlog

include guard to build example in cmakelists

working setup of the ffi layer

remove fmt import

use external fmt lib

end to end ffi flow working

make sure to track include/ffi.h to trigger rebuild from cargo

impl the rust backend which currently cannot move the actual computation in background thread

expose shutdown function at ffi layer

impl RwLock scenario for TensorRtLllmBackend

oops missing c++ backend definitions

compute the number of maximum new tokens for each request independently

make sure the context is not dropped in the middle of the async decoding.

remove unnecessary log

add all the necessary plumbery to return the generated content

update invalid doc in cpp file

correctly forward back the log probabilities

remove unneeded scope variable for now

refactor Stream impl for Generation to factorise code

expose the internal missing start/queue timestamp

forward tgi parameters rep/freq penalty

add some more validation about grammar not supported

define a shared struct to hold the result of a decoding step

expose information about potential error happening while decoding

remove logging

add logging in case of decoding error

make sure executor_worker is provided

add initial Dockerfile for TRTLLM backend

add some more information in CMakeLists.txt to correctly install executorWorker

add some more information in CMakeLists.txt to correctly find and install nvrtc wrapper

simplify prebuilt trtllm libraries name definition

do the same name definition stuff for tensorrt_llm_executor_static

leverage pkg-config to probe libraries paths and reuse new install structure from cmake

fix bad copy/past missing nvinfer linkage direction

align all the linker search dependency

add missing pkgconfig folder for MPI in Dockerfile

correctly setup linking search path for runtime layer

fix missing / before tgi lib path

adding missing ld_library_path for cuda stubs in Dockerfile

update tgi entrypoint

commenting out Python part for TensorRT installation

refactored docker image

move to TensorRT-LLM v0.11.0

make docker linter happy with same capitalization rule

fix typo

refactor the compute capabilities detection along with num gpus

update TensorRT-LLM to latest version

update TensorRT install script to latest

update build.rs to link to cuda 12.5

add missing dependant libraries for linking

clean up a bit

install to decoder_attention target

add some custom stuff for nccl linkage

fix envvar CARGO_CFG_TARGET_ARCH set at runtime vs compile time

use std::env::const::ARCH

make sure variable live long enough...

look for cuda 12.5

add some more basic info in README.md

* Rebase.

* Fix autodocs.

* Let's try to enable trtllm backend.

* Ignore backends/v3 by default.

* Fixing client.

* Fix makefile + autodocs.

* Updating the schema thing + redocly.

* Fix trtllm lint.

* Adding pb files ?

* Remove cargo fmt temporarily.

* ?

* Tmp.

* Remove both check + clippy  ?

* Backporting telemetry.

* Backporting 457fb0a1

* Remove PB from git.

* Fixing PB with default member backends/client

* update TensorRT-LLM to latest version

* provided None for api_key

* link against libtensorrt_llm and not libtensorrt-llm

---------

Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
Co-authored-by: Morgan Funtowicz <morgan@huggingface.co>
2024-07-31 10:33:10 +02:00