Commit Graph

  • b303013227 misc(backend): let's try with GHA Morgan Funtowicz 2024-12-17 10:40:51 +0100
  • 40c03cb2df change xpu lib download link Wang,Yi A 2024-12-17 08:24:49 +0000
  • 8e2e5d8e15 Fix benchmark build error yuanwu 2024-12-17 05:28:30 +0000
  • 7eeefa3b57
    Qwen2-VL runtime error fix when prompted with multiple images (#2840) janne-alatalo 2024-12-17 05:55:11 +0200
  • eaeef6e7a4 Remove the useless modifications yuanwu 2024-12-17 02:08:12 +0000
  • 15de6c9195 Merge branch 'habana-main' into 2.3.0 yuanwu 2024-12-17 02:06:22 +0000
  • f89bdb72c8 Fix runtime error when Qwen2-VL was prompted with multiple images pr-2840-ci-branch Janne Alatalo 2024-12-13 12:29:39 +0200
  • fca2218fa9 Fix runtime error when Qwen2-VL was prompted with multiple images Janne Alatalo 2024-12-13 11:58:07 +0200
  • a72f339c79
    fix: lint backend and doc files (#2850) drbh 2024-12-16 16:12:34 -0500
  • 562bf3c7ea fix: lint backend and doc files drbh 2024-12-16 21:09:48 +0000
  • 7dcee83a63 misc(backend): once more? Morgan Funtowicz 2024-12-16 16:03:07 +0100
  • 5ded9cbd22 misc(backend): test with TGI S3 conf Morgan Funtowicz 2024-12-16 15:51:03 +0100
  • 79f1b953dc misc(backend): test with TGI S3 conf Morgan Funtowicz 2024-12-16 15:47:51 +0100
  • 94c675c6d6
    Update Dockerfile to use devel image for compatibility Yaser Jaradeh 2024-12-16 13:57:53 +0100
  • 11ab329883
    Fixing CI. (#2846) Nicolas Patry 2024-12-16 10:58:15 +0100
  • 7ab931603f
    Fixing CI. Nicolas Patry 2024-12-16 10:35:34 +0100
  • 6f0b8c947d
    New arg. (#2845) Nicolas Patry 2024-12-16 10:34:50 +0100
  • 7b8393e782
    New arg. Nicolas Patry 2024-12-16 10:18:11 +0100
  • 61309b2832
    Remove the default max_tokens for /v1/chat/completions (#251) Sun Choi 2024-12-16 00:32:57 -0800
  • cc2ca4ac22
    HF_TOKEN replaces HUGGING_FACE_HUB_TOKEN as it is deprecated (#253) Sun Choi 2024-12-15 00:59:58 -0800
  • a0c70ca099 update moe-kernels commit Mohit Sharma 2024-12-13 17:16:43 +0000
  • 1fa9ca2f16 add fix fix_fp8_llama3.2 Mohit Sharma 2024-12-13 16:10:00 +0000
  • 1708865fdc
    Feat/trtllm cancellation dev container (#2795) Hugo Larcher 2024-12-13 16:19:06 +0100
  • 5033bf386d
    chore: Fix rebase Hugo Larcher 2024-12-13 16:11:36 +0100
  • 3e75753888
    chore: Fix rebase Hugo Larcher 2024-12-13 16:11:16 +0100
  • 1c54c3d88c
    chore: Update devcontainer Hugo Larcher 2024-12-13 16:09:40 +0100
  • eb724f90fd
    chore: fix dev containers Hugo Larcher 2024-12-02 22:18:28 +0100
  • 89e77c2514
    chore: add dev-containers Hugo Larcher 2024-12-02 22:12:26 +0100
  • 51d064def4
    chore: add dev-containers Hugo Larcher 2024-12-02 20:41:53 +0100
  • 5acb1a6390
    chore: fix dev containers Hugo Larcher 2024-12-02 16:42:42 +0100
  • eb6698f1ed
    chore: fix dev containers Hugo Larcher 2024-12-02 16:23:02 +0100
  • 41ba86fa04
    chore: fix dev containers Hugo Larcher 2024-12-02 16:07:23 +0100
  • 668476a241
    chore: fix dev containers Hugo Larcher 2024-12-02 15:30:59 +0100
  • bad878bdd5
    chore: add dev-containers Hugo Larcher 2024-12-02 15:17:31 +0100
  • 382d0d8bf7
    chore: add dev-containers Hugo Larcher 2024-12-02 15:15:20 +0100
  • 2bc1505dac
    chore: add dev-containers Hugo Larcher 2024-12-02 14:56:00 +0100
  • 805797d50a
    chore: add dev-containers Hugo Larcher 2024-12-02 14:41:38 +0100
  • 480b4f9df2
    test(ctest) enable address sanitizer Morgan Funtowicz 2024-11-19 00:17:10 +0100
  • 12cb6aa6b7 add flash decoding Mohit Sharma 2024-12-13 14:51:11 +0000
  • ea7f4082c4
    TensorRT-LLM backend bump to latest version + misc fixes (#2791) Funtowicz Morgan 2024-12-13 15:50:59 +0100
  • 1640da7c08 misc(backend): indent Morgan Funtowicz 2024-12-13 15:38:04 +0100
  • 104885c71c misc(backend): indent Morgan Funtowicz 2024-12-13 15:37:19 +0100
  • f4c60ca522 default Cyril Vallez 2024-12-13 14:13:47 +0000
  • e93ab925f9 init Cyril Vallez 2024-12-13 14:02:45 +0000
  • 715b2d19ed fix high dim Cyril Vallez 2024-12-13 13:25:56 +0000
  • 985bd6ac41
    Add another variant "nvidia-h100-nvl" lazariv 2024-12-13 11:59:58 +0100
  • 233cbb80e0 Fix runtime error when Qwen2-VL was prompted with multiple images Janne Alatalo 2024-12-13 12:29:39 +0200
  • 6d8cc83813 Fix runtime error when Qwen2-VL was prompted with multiple images Janne Alatalo 2024-12-13 11:58:07 +0200
  • 3da49f3fb9
    Update main.rs with A100 and H100 variants lazariv 2024-12-13 09:53:06 +0100
  • 8d9580669d misc(ci): WAT Morgan Funtowicz 2024-12-13 09:46:13 +0100
  • d5052a9054 misc(ci): WAT Morgan Funtowicz 2024-12-13 09:43:22 +0100
  • 5159d030a9 bitsandbytes: upgrade and enable CUDA Graphs for 4bit by default Matthew Douglas 2024-12-12 17:12:08 -0500
  • f843b62a44 push change Cyril Vallez 2024-12-12 18:25:11 +0000
  • 3bb3fd19ae
    Fixup opt to reduce the amount of odd if statements. (#2833) Nicolas Patry 2024-12-12 18:20:13 +0100
  • 490ca0ef6a working System administrator 2024-12-12 15:48:56 +0000
  • 83e919f617 misc(ci): WAT Morgan Funtowicz 2024-12-12 16:42:38 +0100
  • e312a68469 misc(ci): WAT Morgan Funtowicz 2024-12-12 16:14:48 +0100
  • 4aa060f99a misc(ci): WAT Morgan Funtowicz 2024-12-12 16:08:01 +0100
  • e4d7a6788e
    Merge branch 'main' into feature/get-trace-id-from-req-headers Hyeongchan Kim 2024-12-13 00:05:22 +0900
  • 182ffaf064
    misc: use return Ok(()) feat-backend-llamacpp Funtowicz Morgan 2024-12-12 16:04:05 +0100
  • 6c62ded864 misc(ci): do not build with ssl enabled Morgan Funtowicz 2024-12-12 15:57:11 +0100
  • b31477cf63 misc(ci): lets actually use sccache ... Morgan Funtowicz 2024-12-12 15:38:44 +0100
  • 649cb1f5f1 runnable version System administrator 2024-12-12 14:27:07 +0000
  • d0d62da64c
    Fixing cargo lock Nicolas Patry 2024-12-12 15:13:33 +0100
  • 8b81f72b0f
    Fixup opt to reduce the amount of odd if statements. Nicolas Patry 2024-12-12 15:03:40 +0100
  • 68f5466c86 misc(ci): add debug profile Morgan Funtowicz 2024-12-12 14:52:12 +0100
  • bf59118a93
    fix facebook/opt-125m not working issue (#2824) Wang, Yi 2024-12-12 21:41:30 +0800
  • e37d131b8e misc(ci): add debug profile Morgan Funtowicz 2024-12-12 14:34:30 +0100
  • f99049aafe misc(ci): again Morgan Funtowicz 2024-12-12 14:24:33 +0100
  • c3bd7212c2
    Fixing latest flavor by disabling it. (#2831) Nicolas Patry 2024-12-12 14:09:35 +0100
  • 0c80e7d784
    Fixing latest flavor by disabling it. Nicolas Patry 2024-12-12 14:08:06 +0100
  • f01f2fb6e7
    docs(README): supported hardware links TGI AMD GPUs (#2814) Guspan Tanadi 2024-12-12 19:49:33 +0700
  • 20f8cb9a95 misc(ci): again Morgan Funtowicz 2024-12-12 13:11:54 +0100
  • e6abfdcb1f misc(ci): let's try this way Morgan Funtowicz 2024-12-12 13:00:13 +0100
  • 5f1b16f300 misc(ci): export aws creds as output of step Morgan Funtowicz 2024-12-12 12:48:58 +0100
  • 5910dabb4e misc(ci): provide mecanism to cache inside container Morgan Funtowicz 2024-12-12 12:45:14 +0100
  • e703c84578 misc(ci): let's try to build the Dockerfile for trtllm Morgan Funtowicz 2024-12-12 11:50:39 +0100
  • 48a1a602e7 misc(ci): update Rust action toolchain Morgan Funtowicz 2024-12-12 09:40:07 +0100
  • de36c8e6dd misc(ci): enabe building tensorrt-llm Morgan Funtowicz 2024-12-12 09:29:14 +0100
  • 1ca37d3353 misc(ci): let's use the correct way to invoke sccache s3-cache Morgan Funtowicz 2024-12-11 22:17:37 +0100
  • 9c8519337b
    WIP Guillaume LEGENDRE 2024-12-11 22:08:40 +0100
  • f3d6e8476e misc(ci): let's try to build with sccache Morgan Funtowicz 2024-12-11 22:06:19 +0100
  • cda97a5cfb misc(ci): lets do differently Morgan Funtowicz 2024-12-11 21:55:48 +0100
  • f7eaf2bee7
    fix Guillaume LEGENDRE 2024-12-11 21:45:53 +0100
  • 600faa6d5b
    allow id-token Guillaume LEGENDRE 2024-12-11 21:45:17 +0100
  • 1c3f82c576 misc(ci): enable the wf on the current branch Morgan Funtowicz 2024-12-11 21:35:06 +0100
  • d32411c4a5 misc(ci): install sccache Morgan Funtowicz 2024-12-11 21:31:43 +0100
  • 192da62100 misc(ci): we cannot specify version on a local wf Morgan Funtowicz 2024-12-11 21:23:04 +0100
  • 51af41218d misc(ci): again Morgan Funtowicz 2024-12-11 21:21:35 +0100
  • 8050420d4b misc(ci): ok let's simplify Morgan Funtowicz 2024-12-11 21:14:36 +0100
  • bb9095aae3
    Updating lock. v3.0.1 git_v3.0.1 Nicolas Patry 2024-12-11 21:12:49 +0100
  • 8a65ebe52f
    Patch Release 3.0.1 Nicolas Patry 2024-12-11 21:04:40 +0100
  • 07b01293c5
    Prepare patch release. (#2829) Nicolas Patry 2024-12-12 01:33:50 +0530
  • 0c3fc8b685
    Prepare patch release. Nicolas Patry 2024-12-11 21:00:36 +0100
  • cc66dccbe8
    Update README.md (#2827) RodriMora 2024-12-11 19:45:49 +0100
  • e545d04d8e
    Update README.md RodriMora 2024-12-11 18:27:04 +0100
  • c8623e4135 misc(ci): make runner-group input a string Morgan Funtowicz 2024-12-11 18:14:28 +0100
  • 5bef5a88da feat(trtllm): add trtllm build workflow and update s3-cache Morgan Funtowicz 2024-12-11 18:12:56 +0100
  • 951cc51ade
    add trigger Guillaume LEGENDRE 2024-12-11 15:45:05 +0100
  • 5e954be681
    Create test-s3-cache.yaml Guillaume LEGENDRE 2024-12-11 15:42:02 +0100