Commit Graph

  • 999eba8096 (nit) remove log Mohit Sharma 2024-12-11 12:10:35 +0000
  • 918bea23cb fix facebook/opt-125m not working issue Wang, Yi A 2024-12-11 02:53:20 -0800
  • 3a636ed165
    Update Makefile-flash-att-v2 Cyril Vallez 2024-12-11 11:25:12 +0100
  • e69a384dfb
    Update Makefile-flash-att-v2 Cyril Vallez 2024-12-11 10:58:23 +0100
  • a0035e6607
    Update Makefile-flash-att-v2 Cyril Vallez 2024-12-11 10:05:23 +0100
  • c3b8899f10 Revert "Use optimum-habana v1.15-release branch" yuanwu 2024-12-11 08:17:17 +0000
  • ab6591e759
    chore: Add doc and CI for TRTLLM (#2799) Hugo Larcher 2024-12-11 08:44:50 +0100
  • b653605e54 feat(trtllm): fix logits retrieval trtllm/executor_stats Morgan Funtowicz 2024-12-10 23:28:13 +0100
  • 372799a421
    Update Makefile-flash-att-v2 Cyril Vallez 2024-12-10 20:24:25 +0100
  • 82c24f7420
    Using both value from config as they might not be correct. (#2817) Nicolas Patry 2024-12-11 00:07:09 +0530
  • de73eff07c
    Default value for Backend start_health Nicolas Patry 2024-12-10 19:02:42 +0100
  • e47249a3ea
    Much simpler solution. Nicolas Patry 2024-12-10 18:55:21 +0100
  • a84ecf26aa
    Update Makefile-flash-att-v2 Cyril Vallez 2024-12-10 18:43:44 +0100
  • 738f0b0e35
    Update Makefile-flash-att-v2 Cyril Vallez 2024-12-10 18:37:22 +0100
  • b3b0747432
    switch version to make it work Cyril Vallez 2024-12-10 18:14:22 +0100
  • 45a86d5cf0
    Simple attempt to fix the healthcheck block allocation. Nicolas Patry 2024-12-10 17:57:17 +0100
  • da222900a1
    inits Cyril Vallez 2024-12-10 16:57:07 +0100
  • 60059b6968 feat(trtllm): expose finish reason to Rust Morgan Funtowicz 2024-12-10 16:51:22 +0100
  • ade0f44aca
    add transformers_flash Cyril Vallez 2024-12-10 16:46:55 +0100
  • 22dbc449cd
    Fixing max_position_embeddings for falcon. Nicolas Patry 2024-12-10 12:15:13 +0100
  • 988c1dc622 fix moe kernel comit Mohit Sharma 2024-12-10 11:10:44 +0000
  • 1194cdb1ba remove grouped_topk Mohit Sharma 2024-12-10 10:58:01 +0000
  • 07e9ec2b66 update partition size Mohit Sharma 2024-12-10 10:54:52 +0000
  • ca071bdd1d revert silu Mohit Sharma 2024-12-10 10:41:40 +0000
  • 2cca808c31 (vllm) updated vllm rocm kernels Mohit Sharma 2024-12-10 10:25:38 +0000
  • b91f0c02c6
    Using both value from config as they might not be correct. Nicolas Patry 2024-12-10 10:53:33 +0100
  • a2d878fa0f
    Small update to docs (#2816) Nicolas Patry 2024-12-10 15:16:26 +0530
  • 4a121a30f6
    Small update to docs Nicolas Patry 2024-12-10 15:14:28 +0530
  • fd4f861d2c enable flashdecoding, prefill chunking and prefix caching Wang, Yi A 2024-12-09 20:49:10 -0800
  • 0a2736e34a Merge branch 'main' into flash_decoding Wang, Yi A 2024-12-09 20:46:33 -0800
  • 61f3b4b3d6
    docs(README): supported hardware links TGI AMD GPUs Guspan Tanadi 2024-12-10 10:18:20 +0700
  • 8f326c9791
    Fixing lockfile. v3.0.0 git_v3.0.0 Nicolas Patry 2024-12-09 21:20:59 +0100
  • 7b631e21b0
    Preparing for v3 release. Nicolas Patry 2024-12-10 01:40:09 +0530
  • b2fac5d947
    Hotfix link2 (#2812) Nicolas Patry 2024-12-10 01:27:18 +0530
  • 465426e658
    2nd hotfix ? Nicolas Patry 2024-12-10 01:23:35 +0530
  • a70dd2998b
    Hotfixing the link. (#2811) Nicolas Patry 2024-12-10 01:20:07 +0530
  • c9a38d1452
    Hotfixing the link. Nicolas Patry 2024-12-10 01:19:32 +0530
  • 042791fbd5
    Prep new version (#2810) Nicolas Patry 2024-12-10 01:12:42 +0530
  • 94f7bf54be
    FIxup. Nicolas Patry 2024-12-10 00:58:07 +0530
  • e86de495ab
    Update docs. Nicolas Patry 2024-12-10 00:48:51 +0530
  • 8066b868fe
    Link fixup. Nicolas Patry 2024-12-10 00:37:31 +0530
  • f7f19aa4aa
    New version. Nicolas Patry 2024-12-10 00:35:21 +0530
  • 27fa83ca5b
    V3 doc (#2809) Nicolas Patry 2024-12-10 00:28:07 +0530
  • c734e43446
    Updating asset. Nicolas Patry 2024-12-10 00:24:32 +0530
  • 14189ae859
    V3 document. Nicolas Patry 2024-12-10 00:23:19 +0530
  • a04356fb8c
    Attempt for cleverer auto batch_prefill values (some simplifications). (#2808) Nicolas Patry 2024-12-10 00:14:32 +0530
  • de35b202c4 add moe-kernels Mohit Sharma 2024-12-09 17:49:20 +0000
  • 2264702c01 (kernel) add marlin-kernels Mohit Sharma 2024-12-09 10:30:03 +0000
  • 14d19738f6
    Adding L40s. Nicolas Patry 2024-12-09 11:05:17 +0100
  • 908dec63d4
    Adding L40. Nicolas Patry 2024-12-09 10:54:14 +0100
  • d701f9e866
    Adding small comment for source of calculation. Nicolas Patry 2024-12-09 10:48:20 +0100
  • 36ed43c920
    Update launcher/src/main.rs Nicolas Patry 2024-12-09 10:41:34 +0100
  • c922ef9534 Fix the warmup issue of llama2-7B. yuanwu 2024-12-09 07:20:48 +0000
  • 5b04d6c49d
    Fixing typo insertion. Nicolas Patry 2024-12-08 18:42:13 +0100
  • a0003a62a5
    Less flaky tests. Nicolas Patry 2024-12-08 17:07:09 +0100
  • c6f023a06b Use optimum-habana v1.15-release branch yuanwu 2024-12-08 13:02:31 +0000
  • 1b659788b5 Add the no-deps in pip install yuanwu 2024-12-08 12:14:38 +0000
  • 73e6e3b871 Remove the error log yuanwu 2024-12-08 11:55:13 +0000
  • 037ea55af3
    Attempt for cleverer auto batch_prefill values (some simplifications). Nicolas Patry 2024-12-08 12:36:46 +0100
  • 9f356ce045 Refine the warmup process yuanwu 2024-12-07 09:56:16 +0000
  • 9f5c9a5e22
    Enable paligemma2 (#2807) drbh 2024-12-06 14:41:49 -0500
  • 08f6fa0b59
    Removing experimental to prefill chunking. Nicolas Patry 2024-12-06 19:09:40 +0100
  • 6c62c8db24 feat: add test for paligemma2 drbh 2024-12-06 17:33:42 +0000
  • d96dcb1797
    Adding A100 compute. (#2806) Nicolas Patry 2024-12-06 22:49:15 +0530
  • 0096ba471d
    Adding A100 compute. Nicolas Patry 2024-12-06 18:18:00 +0100
  • 0f0fe9a998 feat: support loading gemma2 as vlm text model drbh 2024-12-06 10:46:49 -0500
  • e22cb47fe3 (fix) fp8 scaling for cuda Mohit Sharma 2024-12-06 14:16:39 +0000
  • 5df8059037
    Auto max prefill (#2797) Nicolas Patry 2024-12-06 10:22:00 +0530
  • 8c3669b287
    feat: auto max_new_tokens (#2803) OlivierDehaene 2024-12-06 05:50:35 +0100
  • 68334a5fbf
    Fixing the tests. Nicolas Patry 2024-12-06 05:38:05 +0100
  • 6685e8fcda
    use oneapi 2024 docker image directly for xpu (#2793) Wang, Yi 2024-12-06 12:06:23 +0800
  • f022ecfaf8
    Attempting to reduces the issues (workarounds for now). Nicolas Patry 2024-12-05 20:26:17 +0100
  • 7f1c22a72b
    update default OlivierDehaene 2024-12-05 18:22:54 +0100
  • 124eea2d0e
    feat: auto max_new_tokens OlivierDehaene 2024-12-05 18:17:40 +0100
  • f0cd4742c2 misc(backend): fix reborrowing Pin<&mut T> as described in the doc https://doc.rust-lang.org/stable/std/pin/struct.Pin.html#method.as_mut Morgan Funtowicz 2024-12-05 16:31:19 +0100
  • 049f4acd5b feat(backend): fix missing "0" field access Morgan Funtowicz 2024-12-05 15:29:23 +0100
  • b3cd5ea076 feat(backend): make sure we can easily cancel request on the executor Morgan Funtowicz 2024-12-05 13:54:56 +0100
  • ca8a115adc
    Remove some scaffolding. Nicolas Patry 2024-12-04 21:54:25 +0100
  • a78b6fd1e8
    Fixing a few tests. Nicolas Patry 2024-12-04 21:34:46 +0100
  • e0db633396
    fix: avoid setting use_sgmv if no kernels present (#2796) drbh 2024-12-04 15:26:09 -0500
  • 3ed703c273
    Repairing prompt token counting. Nicolas Patry 2024-12-04 19:18:22 +0100
  • 3a86afc713
    Add a flag that enables users to get logprobs back. Nicolas Patry 2024-12-04 18:45:28 +0100
  • f6998f84e9
    Dropping all the prefill logprobs. Nicolas Patry 2024-12-04 18:39:10 +0100
  • 300f6c6f94 feat(backend) fix moving backend when pulling Morgan Funtowicz 2024-12-04 17:32:14 +0100
  • 460f290d5b effectively cancel the request on the executor Morgan Funtowicz 2024-12-04 14:29:04 +0100
  • b6dbf605af chore(trtllm): update dependency towards 0.15.0 Morgan Funtowicz 2024-12-04 12:02:42 +0100
  • 7788a6b849
    doc: Formatting Hugo Larcher 2024-12-04 10:49:41 +0100
  • cc6bc339e5 test(backend): more test coverage Morgan Funtowicz 2024-12-04 00:16:02 +0100
  • 13e6d522b7
    More tests. Nicolas Patry 2024-12-03 19:05:36 +0100
  • 62530649b8 feat(backend): remove constexpig Morgan Funtowicz 2024-12-03 16:47:48 +0100
  • 881527a544 feat(backend): remove constexpr from par Morgan Funtowicz 2024-12-03 16:46:59 +0100
  • 491b5726a6
    chore: Add doc and CI for TRTLLM Hugo Larcher 2024-12-03 16:44:10 +0100
  • bd3a19dd8f
    chore: Add doc and CI for TRTLLM Hugo Larcher 2024-12-03 16:42:03 +0100
  • 51123a603d
    chore: Add doc and CI for TRTLLM Hugo Larcher 2024-12-03 16:40:32 +0100
  • e6c15e5570
    chore: Add doc and CI for TRTLLM Hugo Larcher 2024-12-03 16:04:57 +0100
  • e2454dba40 (feat) convert tscales to tensorwise Mohit Sharma 2024-12-03 15:12:18 +0000
  • 28ba5e9618 fix: avoid setting use_sgmv if no kernels present drbh 2024-12-02 19:40:49 -0500
  • ad3ed0d1a1 test(backend): add more unittest Morgan Funtowicz 2024-12-03 14:39:10 +0100
  • c94b9de445 feat(backend): add guard to multiple header definitions Morgan Funtowicz 2024-12-03 14:07:49 +0100
  • 16ba2f5a2b feat(backend): fix main.rs retrieving the tokenizer Morgan Funtowicz 2024-12-03 12:11:17 +0100