Adrien Gallouët
fb81c0d1c4
Thanks clippy
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-06 10:57:56 +01:00
Adrien Gallouët
e4d5fa7eaf
Update docs
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-06 09:46:24 +00:00
Adrien Gallouët
1641c22af8
Add doc
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-05 21:14:30 +00:00
Adrien Gallouët
b3e40c4b66
Improve default settings
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-05 16:38:52 +00:00
Adrien Gallouët
f22e2fb550
Cleanup
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-05 16:12:34 +00:00
Adrien Gallouët
0f62401b8e
Initialize penalty_last_n with llamacpp default value
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-05 15:44:46 +00:00
Adrien Gallouët
695b1292e9
Ensure all samplers are freed on error
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-05 15:42:59 +00:00
Adrien Gallouët
5b777877b1
Make max_batch_total_tokens optional
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-05 11:40:20 +00:00
Adrien Gallouët
09a745f1b8
Remove n_ctx
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-05 11:31:58 +00:00
Adrien Gallouët
051ff2d5ce
Rename bindings
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-05 11:21:41 +00:00
Adrien Gallouët
dbee804129
Simplify batching logic
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-05 10:12:39 +00:00
Adrien Gallouët
d3a772a8dd
Update args
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-05 10:10:38 +00:00
Adrien Gallouët
df2a4fbb8a
Update Dockerfile_llamacpp
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:59 +00:00
Adrien Gallouët
d883109df6
Disable graceful shutdown in debug mode
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:59 +00:00
Adrien Gallouët
38b33e9698
Add --type-v & --type-k
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:59 +00:00
Adrien Gallouët
bfb8e03e9f
Add specific args for batch
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:59 +00:00
Adrien Gallouët
ea28332bb3
Cleanup
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:59 +00:00
Adrien Gallouët
104a968d01
Remove warmup
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:59 +00:00
Adrien Gallouët
8ed362d03a
Clear request cache after completion
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:59 +00:00
Adrien Gallouët
c8505fb300
Auto-detect n_threads when not provided
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:59 +00:00
Adrien Gallouët
27534d8ee4
Fix seq iterations
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:59 +00:00
Adrien Gallouët
96434a1e7e
Fix batching
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:59 +00:00
Adrien Gallouët
2a51e415ff
Output real logprobs
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
161280f313
Only export the latest logits
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Morgan Funtowicz
960c12bd6e
backend(llama): add CUDA Dockerfile_llamacpp for now
2025-02-04 13:32:58 +00:00
Adrien Gallouët
f38c34aeb7
Fix batch_pos
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
e88a527fcf
Add --offload-kqv
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
ae5bb789c2
Enable flash attention by default
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
3f199134f0
Fix args
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
7a3ed4171e
Add --numa
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
390f0ec061
Cleanup
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
d6ded897a8
Add a stupid batch mechanism
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
e07835c5b5
Add --defrag-threshold
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
f388747985
Add GPU args
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
8d2dfdf668
Handle ctx args & fix sampling
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
a7b4b04cb5
Add some input validation checks
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
e7facf692f
Handle max_batch_size
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
3eb4823f3e
Use max_batch_total_tokens
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
bd0cc9905c
Get rid of llama_batch_get_one()
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:58 +00:00
Adrien Gallouët
95e221eece
Add llamacpp backend
...
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-02-04 13:32:56 +00:00
Hugo Larcher
73b7cf83f6
Add backend name to telemetry ( #2962 )
...
* feat: Add backend name to telemetry
2025-01-28 16:53:16 +01:00
Funtowicz Morgan
40b00275b2
Attempt to remove AWS S3 flaky cache for sccache ( #2953 )
...
* backend(trtllm): attempt to remove AWS S3 flaky cache for sccache
* backend(trtllm): what if we expose ENV instead of inline?
* backend(trtllm): and with the right env var for gha sccache
* backend(trtllm): relax the way to detect sccache
* backend(trtllm): make sccache definition manually
* backend(trtllm): ok let's try to define the launchers in build.rs when rustc_wrapper is present
* backend(trtllm): export env variable in run mb?
* backend(trtllm): Cache mode max to cache intermediate layers
* backend(trtllm): inject ompi_version build arg in dependent step
2025-01-27 11:21:48 +01:00
Funtowicz Morgan
0a89902663
[TRTLLM] Expose finish reason ( #2841 )
...
* feat(trtllm): expose finish reason to Rust
* misc(llamacpp): fix typo
* misc(backend): update deps
2025-01-23 16:48:26 +01:00
Funtowicz Morgan
cc212154e0
Bump TensorRT-LLM backend dependency to v0.16.0 ( #2931 )
...
* backend(trtllm): update to 0.16.0
* backend(trtllm): do not use shallow clone
* backend(trtllm): use tag instead
* backend(trtllm): move to nvidia remote instead of hf
* backend(trtllm): reenable shallow clone
* backend(trtllm): attempt to use ADD instead of RUN for openmpi
* backend(trtllm): make sure we are using correct path for openmpi ADD in dockerfile
* backend(trtllm): add correctly untar it
2025-01-23 13:54:40 +01:00
Alvaro Bartolome
64a33c1f05
Run pre-commit run --all-files
to fix CI ( #2933 )
2025-01-21 17:33:33 +01:00
Funtowicz Morgan
17367438f3
Give TensorRT-LLMa proper CI/CD 😍 ( #2886 )
...
* test(ctest) enable address sanitizer
* feat(trtllm): expose finish reason to Rust
* feat(trtllm): fix logits retrieval
* misc(ci): enabe building tensorrt-llm
* misc(ci): update Rust action toolchain
* misc(ci): let's try to build the Dockerfile for trtllm
# Conflicts:
# Dockerfile_trtllm
* misc(ci): provide mecanism to cache inside container
* misc(ci): export aws creds as output of step
* misc(ci): let's try this way
* misc(ci): again
* misc(ci): again
* misc(ci): add debug profile
* misc(ci): add debug profile
* misc(ci): lets actually use sccache ...
* misc(ci): do not build with ssl enabled
* misc(ci): WAT
* misc(ci): WAT
* misc(ci): WAT
* misc(ci): WAT
* misc(ci): WAT
* misc(backend): test with TGI S3 conf
* misc(backend): test with TGI S3 conf
* misc(backend): once more?
* misc(backend): let's try with GHA
* misc(backend): missing env directive
* misc(backend): make sure to correctly set IS_GHA_BUILD=true in wf
* misc(backend): ok let's debug smtg
* misc(backend): WWWWWWWWWWWWWAAAAAAAA
* misc(backend): kthxbye retry s3
* misc(backend): use session token
* misc(backend): add more info
* misc(backend): lets try 1h30
* misc(backend): lets try 1h30
* misc(backend): increase to 2h
* misc(backend): lets try...
* misc(backend): lets try...
* misc(backend): let's build for ci-runtime
* misc(backend): let's add some more tooling
* misc(backend): add some tags
* misc(backend): disable Werror for now
* misc(backend): added automatic gha detection
* misc(backend): remove leak sanitizer which is included in asan
* misc(backend): forward env
* misc(backend): forward env
* misc(backend): let's try
* misc(backend): let's try
* misc(backend): again
* misc(backend): again
* misc(backend): again
* misc(backend): again
* misc(backend): again
* misc(backend): fix sscache -> sccache
* misc(backend): fix sscache -> sccache
* misc(backend): fix sscache -> sccache
* misc(backend): let's actually cache things now
* misc(backend): let's actually cache things now
* misc(backend): attempt to run the testS?
* misc(backend): attempt to run the tests?
* misc(backend): attempt to run the tests?
* change runner size
* fix: Correctly tag docker images (#2878 )
* fix: Correctly tag docker images
* fix: Correctly tag docker images
* misc(llamacpp): maybe?
* misc(llamacpp): maybe?
* misc(llamacpp): maybe?
* misc(ci): gogogo
* misc(ci): gogogo
* misc(ci): gogogo
* misc(ci): gogogo
* misc(ci): gogogo
* misc(ci): gogogo
* misc(ci): go
* misc(ci): go
* misc(ci): go
* misc(ci): use bin folder
* misc(ci): make the wf callable for reuse
* misc(ci): make the wf callable for reuse (bis)
* misc(ci): make the wf callable for reuse (bis)
* misc(ci): give the wf a name
* Create test-trtllm.yml
* Update test-trtllm.yml
* Create build-trtllm2
* Rename build-trtllm2 to 1-build-trtllm2
* Rename test-trtllm.yml to 1-test-trtllm2.yml
* misc(ci): fw secrets
* Update 1-test-trtllm2.yml
* Rename 1-build-trtllm2 to 1-build-trtllm2.yml
* Update 1-test-trtllm2.yml
* misc(ci): use ci-build.yaml as main dispatcher
* Delete .github/workflows/1-test-trtllm2.yml
* Delete .github/workflows/1-build-trtllm2.yml
* misc(ci): rights?
* misc(ci): rights?
* misc(ci): once more?
* misc(ci): once more?
* misc(ci): baby more time?
* misc(ci): baby more time?
* misc(ci): try the permission above again?
* misc(ci): try the permission above again?
* misc(ci): try the permission scoped again?
* misc(ci): install tensorrt_llm_executor_static
* misc(ci): attempt to rebuild with sccache?
* misc(ci):run the tests on GPU instance
* misc(ci): let's actually setup sccache in the build.rs
* misc(ci): reintroduce variables
* misc(ci): enforce sccache
* misc(ci): correct right job name dependency
* misc(ci): detect dev profile for debug
* misc(ci): detect gha build
* misc(ci): detect gha build
* misc(ci): ok debug
* misc(ci): wtf
* misc(ci): wtf2
* misc(ci): wtf3
* misc(ci): use commit HEAD instead of merge commit for image id
* misc(ci): wtfinfini
* misc(ci): wtfinfini
* misc(ci): KAMEHAMEHA
* Merge TRTLLM in standard CI
* misc(ci): remove input machine
* misc(ci): missing id-token for AWS auth
* misc(ci): missing id-token for AWS auth
* misc(ci): missing id-token for AWS auth
* misc(ci): again...
* misc(ci): again...
* misc(ci): again...
* misc(ci): again...
* misc(ci): missing benchmark
* misc(ci): missing backends
* misc(ci): missing launcher
* misc(ci): give everything aws needs
* misc(ci): give everything aws needs
* misc(ci): fix warnings
* misc(ci): attempt to fix sccache not building trtllm
* misc(ci): attempt to fix sccache not building trtllm again
---------
Co-authored-by: Guillaume LEGENDRE <glegendre01@gmail.com>
Co-authored-by: Hugo Larcher <hugo.larcher@huggingface.co>
Co-authored-by: Pauline Bailly-Masson <155966238+paulinebm@users.noreply.github.com>
2025-01-21 10:19:16 +01:00
drbh
8f6146f11a
Revert "feat: improve qwen2-vl startup " ( #2924 )
...
Revert "feat: improve qwen2-vl startup (#2802 )"
This reverts commit eecca27113
.
2025-01-17 12:09:05 -05:00
drbh
eecca27113
feat: improve qwen2-vl startup ( #2802 )
...
* feat: tokenize each request individually and increase warmup image size
* feat: adjust rotary embed and avoid cuda graphs of size 2 and smaller
* fix: address image resize and rebase changes
* feat: update to run qwen2-vl tests
* fix: tweak param types
2025-01-17 11:50:41 -05:00
Nicolas Patry
203cade244
Upgrading our rustc version. ( #2908 )
...
* Upgrading our rustc version.
* Fixing the rust tests to proper version.
* Clippy everything.
2025-01-15 17:04:03 +01:00
Dmitry Dygalo
01067f8ba8
chore: Update jsonschema to 0.28.0 ( #2870 )
...
* chore: Update jsonschema to 0.28.0
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
* chore: Enable blocking feature for reqwest
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
---------
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
2025-01-10 15:01:54 +01:00