Commit Graph

  • 45d3a3a253
    fix: Allow back arguments in function definition and the corresponding test Nicolas Casademont 2025-02-04 11:07:42 +0100
  • 58f5f2ee27 fix: adjust signatures with types drbh 2025-02-04 00:30:47 +0000
  • 6cb0cb68b4 fix: improve and simplify get_cos_sin, refactors and cleanup get_position_ids drbh 2025-02-04 00:25:59 +0000
  • 88fd56f549
    Add strftime_now callable function for minijinja chat templates (#2983) Alvaro Bartolome 2025-02-03 15:30:48 +0100
  • 8ae92e5d70
    Merge branch 'huggingface:main' into fix/dockerfile-triton Yaser Jaradeh 2025-02-03 11:48:01 +0100
  • e3f2018cb5
    hotfix: fix trtllm CI build on release (#2981) Hugo Larcher 2025-02-03 11:11:15 +0100
  • 5102f3f55a
    fix: test release. Works. Hugo Larcher 2025-02-02 13:03:40 +0100
  • ae9def4aec
    fix: test release. env not recognized https://github.com/actions/runner/issues/1661 Hugo Larcher 2025-02-02 11:10:09 +0100
  • 66af3c7801
    fix: test release. Hugo Larcher 2025-02-01 21:47:45 +0100
  • b4f4817f1a
    fix: test release. Hugo Larcher 2025-02-01 13:35:43 +0100
  • 77940ac73f
    Fix test_chat_template_valid_with_strftime_now Alvaro Bartolome 2025-01-31 21:52:55 +0100
  • 1c17d8a768
    Fix test_chat_template_valid_with_strftime_now Alvaro Bartolome 2025-01-31 21:41:15 +0100
  • 3d132ab627
    Add chrono and strftime_now function callable Alvaro Bartolome 2025-01-31 21:25:41 +0100
  • 9eaa163239 fix: add more test and improve model generation drbh 2025-01-31 18:30:32 +0000
  • aeb1262ab3
    hotfix: fix trtllm CI build on release Hugo Larcher 2025-01-31 18:15:29 +0100
  • 79550f8b47 fix: remove check for default rope type drbh 2025-01-29 16:10:17 +0000
  • cb7ec9cb60 fix: improve mrope check in cuda graph warmup drbh 2025-01-29 13:03:36 +0000
  • 585e270ac3 fix: check key before access drbh 2025-01-29 00:10:43 +0000
  • d0e2332d17 fix: check existance before accessing rope type in cuda warmup drbh 2025-01-28 22:54:34 +0000
  • 79a2c956de fix: improve position id init during cuda warmup for mrope and simplfy rotary forward drbh 2025-01-28 21:08:58 +0000
  • c75c01e9b9 fix: update position ids so first dim is batch, simplify rotary and bump vlm default token limit drbh 2025-01-28 19:25:23 +0000
  • 68e3ee8e79 fix: simplify get position ids and remove usused vision config drbh 2025-01-28 15:40:05 +0000
  • 6893eb3834 fix: adjust rotaty init path drbh 2025-01-27 16:02:51 +0000
  • 5f416f6e28 fix: enable all cuda graphs and bump snapshots drbh 2025-01-23 15:32:46 +0000
  • eef3c7bdf2 fix: prefer default dtype drbh 2025-01-23 15:07:19 +0000
  • 7ab99bc6b3 feat: refactor position ids in warmup and bump tests drbh 2025-01-22 20:51:20 +0000
  • cf5c66043e fix: include clippy lint drbh 2025-01-22 18:38:07 +0000
  • a0ab962b6d fix: limit vision flop calc to qwen2 vl models and update config typing drbh 2025-01-22 18:30:03 +0000
  • d12e075966 fix: improve multimodal rotary embed caching drbh 2025-01-22 16:43:53 +0000
  • 77ef543061 feat: refactor model, improve startup and re enable tests drbh 2025-01-21 22:31:22 +0000
  • 0d3cb2baa5
    Merge 50c8ebdef0 into bb69c5b199 Nicolas Patry 2025-01-31 14:43:04 +0100
  • bb69c5b199
    Back on nix main. (#2979) Nicolas Patry 2025-01-31 14:39:52 +0100
  • 3b62020c60
    Back on nix main. Nicolas Patry 2025-01-31 14:29:12 +0100
  • 463228ebfc
    Update version number. v3.1.0 git_v3.1.0 Nicolas Patry 2025-01-31 14:24:45 +0100
  • c9d68945cc
    Prepare for release 3.1.0 (#2972) Nicolas Patry 2025-01-31 14:19:01 +0100
  • 50c8ebdef0
    CI must be green. Nicolas Patry 2025-01-31 13:16:29 +0100
  • 6a7b92a7ea
    Deactivating the flaky test. Nicolas Patry 2025-01-31 12:33:08 +0100
  • c07a2cc82b
    Update moe-kernel to 0.8.2 for rocm (#2977) Mohit Sharma 2025-01-31 16:10:00 +0530
  • e973602289 update moe-kernel for amd Mohit Sharma 2025-01-31 10:31:13 +0000
  • 5452c1294c backend(vllm): disable metrics for now vllm/setup Morgan Funtowicz 2025-01-31 10:56:54 +0100
  • 77bc943341
    Upgrade to moe-kernels 0.8.2 for Hip support. Nicolas Patry 2025-01-31 10:49:18 +0100
  • 4d4c833379
    Fixing stuff. Nicolas Patry 2025-01-31 09:18:44 +0100
  • 57fa04adfd
    Cleaner version. Nicolas Patry 2025-01-31 09:07:31 +0100
  • 1932c5b9ed
    Adding kvrouter to the workspace. Nicolas Patry 2025-01-29 13:48:06 +0100
  • 914b163768
    Remove kvrouter from default members. Nicolas Patry 2025-01-29 13:27:32 +0100
  • 0a495ad118
    Updating the kvrouter to support roundrobin Nicolas Patry 2025-01-29 12:40:26 +0100
  • 6a88063cc2
    Adding Dummy kvrouter. Nicolas Patry 2025-01-28 19:48:17 +0100
  • 7ef8b89ee7
    More logs in the allocator. Nicolas Patry 2025-01-28 11:19:37 +0100
  • 34e1b986f7
    Back on main flake. Nicolas Patry 2025-01-30 21:12:34 +0100
  • f01862c0c0
    Prepare for release 3.1.0 Nicolas Patry 2025-01-30 20:49:08 +0100
  • b1a9dfff21 Add tests for all aliases Alex Weston 2025-01-30 14:11:05 -0500
  • 67a696fad9 Add json_schema alias for GrammarType Alex Weston 2025-01-30 14:03:54 -0500
  • 065aabb13d
    doc: Update TRTLLM deployment doc. (#2960) Hugo Larcher 2025-01-30 18:04:42 +0100
  • cb747b33da
    Add deepseekv3 (#2968) Nicolas Patry 2025-01-30 16:40:25 +0100
  • 2c90b0575b
    Scoring func softmax is the only one that works. Nicolas Patry 2025-01-30 16:28:55 +0100
  • d5b2c25d8f
    Fix other call locations. Nicolas Patry 2025-01-30 16:26:44 +0100
  • c174142fe5
    Put link to ref. Nicolas Patry 2025-01-30 16:21:26 +0100
  • 003163a2b9 backend(vllm): map ResultOutput to InferStreamResponse to stream back to the client Morgan Funtowicz 2025-01-30 16:12:52 +0100
  • 51bc8a4e45
    Fixing Mixtral + Nits. Nicolas Patry 2025-01-30 16:09:15 +0100
  • 32dffcff60 backend(vllm): expose FFI for CompletionOutput and RequestOutput on Rust side Morgan Funtowicz 2025-01-30 13:35:21 +0100
  • f56e24b346
    Apply suggestions from code review Nicolas Patry 2025-01-30 11:59:12 +0100
  • 9376066b24
    Black. Nicolas Patry 2025-01-30 11:21:17 +0100
  • 7539881054
    Fixing moe import. Nicolas Patry 2025-01-30 11:05:21 +0100
  • 351f3c6ee5
    Upgrade to 0.8.1 Nicolas Patry 2025-01-30 10:58:57 +0100
  • c8519aa7d8
    Moe kernels 0.8.1 Nicolas Patry 2025-01-30 10:45:14 +0100
  • ee9178fb8b
    Small modifications. Nicolas Patry 2025-01-29 22:02:53 +0100
  • f190bc1d7a
    Add fp8 support moe models Mohit Sharma 2025-01-20 13:55:54 +0000
  • b52164d38a
    Complete padding of CausalLMBatch when there exists batch bucketing (#261) kaixuanliu 2025-01-30 17:19:13 +0800
  • 4e1c68e6f8
    Increase session time gha_sccache_use_secrets Guillaume LEGENDRE 2025-01-30 09:53:28 +0100
  • 68c31ca939
    fix: PR comments Hugo Larcher 2025-01-30 09:42:31 +0100
  • 80e7d98f88
    Hotfixing intel-cpu (not sure how it was working before). (#2967) Nicolas Patry 2025-01-29 22:34:41 +0100
  • b2a13b92f9
    Do not fail on missing moe-kernels (Intel-cpu). Nicolas Patry 2025-01-29 22:24:18 +0100
  • b73aec7fa3
    Hotfixing intel-cpu (not sure how it was working before). Nicolas Patry 2025-01-29 18:21:48 +0100
  • ee0dffcd14
    Update to moe-kernels 0.8.0 (#2966) Daniël de Kok 2025-01-29 18:19:55 +0100
  • 2633804fd6 Update to moe-kernels 0.8.0 Daniël de Kok 2025-01-29 16:04:23 +0000
  • 7028f5bce2 backend(vllm): make v1 the default Morgan Funtowicz 2025-01-29 17:01:20 +0100
  • 2446d240aa
    WIP: Add AWS session token Guillaume LEGENDRE 2025-01-29 15:17:38 +0100
  • 4ef2e045c9
    Add fp8 support moe models (#2928) Mohit Sharma 2025-01-29 18:26:32 +0530
  • 62a3b78deb misc(gha): fix invalid syntax for secrets Morgan Funtowicz 2025-01-29 13:49:12 +0100
  • b0b855fecd update doc add_deepseekv3 Mohit Sharma 2025-01-29 12:46:03 +0000
  • 7be2a5f346 flatten condition Mohit Sharma 2025-01-29 11:13:22 +0000
  • 57c7ae2ef8
    fix aws auth creds Guillaume LEGENDRE 2025-01-29 12:09:02 +0100
  • 16d5376f95
    change bucket name Guillaume LEGENDRE 2025-01-29 12:03:06 +0100
  • 967f6bf4c2
    Fix Typo Guillaume LEGENDRE 2025-01-29 11:40:30 +0100
  • 0863dd1533 update dockerfile Mohit Sharma 2025-01-29 10:40:19 +0000
  • 05d0aa678e format codfe' Mohit Sharma 2025-01-29 10:39:09 +0000
  • c94c544d7d
    (CI): Move S3 Auth to OIDC Guillaume LEGENDRE 2025-01-29 11:31:11 +0100
  • dc2dceb795 misc(gha): expose action cache url and runtime as secrets Morgan Funtowicz 2025-01-29 10:30:07 +0100
  • 57de05b0dd add deepseekv3 Mohit Sharma 2025-01-28 17:05:53 +0000
  • 73b7cf83f6
    Add backend name to telemetry (#2962) Hugo Larcher 2025-01-28 16:53:16 +0100
  • 568b1c8585
    feat: Add backend name to telemetry Hugo Larcher 2025-01-28 14:28:53 +0100
  • 433373b2ec
    feat: Add backend name to telemetry Hugo Larcher 2025-01-28 11:59:50 +0100
  • c871d74b46
    More logs in the allocator. more_logs Nicolas Patry 2025-01-28 11:19:37 +0100
  • eb3df0f46f
    Fixing the oom maybe with 2.5.1 change. (#2958) Nicolas Patry 2025-01-28 10:30:28 +0100
  • c690da5973
    fix: Telemetry (#2957) Hugo Larcher 2025-01-28 10:29:18 +0100
  • c9067176c3
    doc: Update TRTLLM deployment doc. Update TRTLLM CI to allow release builds when tagging TGI. Hugo Larcher 2025-01-27 23:25:28 +0100
  • fb51fc9001
    doc: Update TRTLLM deployment doc. Update TRTLLM CI to allow release builds when tagging TGI. Hugo Larcher 2025-01-27 23:19:51 +0100
  • dc5addae81 backend(vllm): remove python print stmt Morgan Funtowicz 2025-01-27 22:43:16 +0100
  • a7c2a470d6 backend(vllm): submit new request to vLLM engine Morgan Funtowicz 2025-01-27 22:39:35 +0100
  • 8c78ec671c
    doc: Rephrase properly. Hugo Larcher 2025-01-27 19:03:19 +0100