Commit Graph

  • 2e067fabd3
    fix: clippy Hugo Larcher 2025-01-27 14:37:57 +0100
  • 57570bf598
    Fixing the oom maybe with 2.5.1 change. Nicolas Patry 2025-01-27 14:35:00 +0100
  • a8ba2542d8
    fix: update ping delay and update doc. Hugo Larcher 2025-01-27 14:12:03 +0100
  • 19bb3bf355
    fix: simplify error handling Hugo Larcher 2025-01-27 14:04:33 +0100
  • db922eb77e
    Update to attention-kernels 0.2.0 (#2950) Daniël de Kok 2025-01-27 11:42:36 +0100
  • 40b00275b2
    Attempt to remove AWS S3 flaky cache for sccache (#2953) Funtowicz Morgan 2025-01-27 11:21:48 +0100
  • 5a317ffad7 backend(trtllm): inject ompi_version build arg in dependent step Morgan Funtowicz 2025-01-26 15:21:38 +0100
  • c632f8a95a backend(trtllm): Cache mode max to cache intermediate layers Morgan Funtowicz 2025-01-26 11:38:00 +0100
  • cad4644537 backend(trtllm): export env variable in run mb? Morgan Funtowicz 2025-01-25 08:00:07 +0100
  • cb1dab12c1 backend(trtllm): ok let's try to define the launchers in build.rs when rustc_wrapper is present Morgan Funtowicz 2025-01-25 01:31:13 +0100
  • e7064c95da backend(trtllm): make sccache definition manually Morgan Funtowicz 2025-01-24 22:01:43 +0100
  • a434c2ffc9 backend(trtllm): relax the way to detect sccache Morgan Funtowicz 2025-01-24 19:36:28 +0100
  • 4c8bf7f5b8
    fix: add telemetry regular pings and fix unhandled errors avoid not sending telemetry stop events. Hugo Larcher 2025-01-24 18:10:12 +0100
  • cb452ae7e8 backend(trtllm): and with the right env var for gha sccache Morgan Funtowicz 2025-01-24 17:50:06 +0100
  • a8a9168065 backend(trtllm): what if we expose ENV instead of inline? Morgan Funtowicz 2025-01-24 17:47:49 +0100
  • 556a61d143 backend(trtllm): attempt to remove AWS S3 flaky cache for sccache Morgan Funtowicz 2025-01-24 15:50:28 +0100
  • bafbd06744
    Update transformers_flash_causal_lm.py fix-tp Cyril Vallez 2025-01-24 15:06:50 +0100
  • de83178bc3
    tp monkey patch Cyril Vallez 2025-01-24 15:03:14 +0100
  • 2024f54d71
    feat: Make streaming for tool calling behave the same as the open ai api Nicolas Casademont 2025-01-24 14:42:25 +0100
  • 6cb41a80a1
    Revert "Remove AWS credentials?" Nicolas Patry 2025-01-24 14:34:17 +0100
  • d2ff68e98d
    Remove AWS credentials? Nicolas Patry 2025-01-24 12:18:28 +0100
  • b70f29d729
    Bypasse perm issue. v3.0.2 git_v3.0.2 Nicolas Patry 2025-01-24 12:12:47 +0100
  • 9157833662 Update to attention-kernels 0.2.0 Daniël de Kok 2025-01-23 12:49:07 +0000
  • 4b8d09d63e
    fix: Adapt function call response to return a json string for arguments Nicolas Casademont 2025-01-24 11:47:01 +0100
  • e413b01eb1
    Create patch release. Nicolas Patry 2025-01-24 10:50:15 +0100
  • 02e4b9ab32 backend(vllm): plug in the tokio server and CLI Morgan Funtowicz 2025-01-24 10:41:07 +0100
  • 2bf3ea8517 add local file read path for image which could work with dataset like Lin-Chen/ShareGPT4V Wang, Yi A 2025-01-22 23:01:39 -0800
  • f709466767 Make tool_call a list for streaming case datta0 2025-01-24 09:09:40 +0000
  • 3495248d87 Fix tool call response to adhere to OpenAI spec datta0 2025-01-24 07:22:11 +0000
  • d9dda11726
    Trying to put back the archlist (to fix the oom). (#2947) Nicolas Patry 2025-01-24 09:32:17 +0100
  • 0dd8a96613
    Trying to put back the archlist (to fix the oom). Nicolas Patry 2025-01-24 00:46:47 +0100
  • d937eb64da
    Fixing cargo lock. Nicolas Patry 2025-01-23 18:54:34 +0100
  • 18c4607d46
    Transformers backend TP fix (#2945) Cyril Vallez 2025-01-23 18:09:57 +0100
  • bcd9d3a5cb
    cohere fix Cyril Vallez 2025-01-23 12:49:30 +0000
  • f4dc44b88c
    init dispatch Cyril Vallez 2025-01-23 12:40:56 +0000
  • 29a0893b67
    Tmp tp transformers (#2942) Nicolas Patry 2025-01-23 18:07:30 +0100
  • fe7594e369
    Fix the warmup issue of prefill batch_size (#268) Yuan Wu 2025-01-24 00:26:17 +0800
  • 0a89902663
    [TRTLLM] Expose finish reason (#2841) Funtowicz Morgan 2025-01-23 16:48:26 +0100
  • 0f2845081b
    Remove the archlist, it's defined in the docker anyway. Nicolas Patry 2025-01-23 14:31:35 +0100
  • 4e172028aa
    Add NVIDIA A40 to known cards (#2941) Nikolai Kolodziej 2025-01-23 14:19:21 +0100
  • 6ab02931cf
    Set alias for max_completion_tokens in ChatRequest (#2932) Alvaro Bartolome 2025-01-23 14:18:47 +0100
  • 0c879ff318 misc(backend): update deps Morgan Funtowicz 2025-01-21 14:17:58 +0100
  • 4c6ee944d0 misc(llamacpp): fix typo Morgan Funtowicz 2024-12-13 17:13:29 +0100
  • 0da255ecbc feat(trtllm): expose finish reason to Rust Morgan Funtowicz 2024-12-10 16:51:22 +0100
  • cc212154e0
    Bump TensorRT-LLM backend dependency to v0.16.0 (#2931) Funtowicz Morgan 2025-01-23 13:54:40 +0100
  • f331091ba3
    Using 2.5 kernels. Nicolas Patry 2025-01-23 12:14:46 +0100
  • 980fb92529 backend(trtllm): add correctly untar it Morgan Funtowicz 2025-01-23 12:00:15 +0100
  • 6fd50ff3ba backend(trtllm): make sure we are using correct path for openmpi ADD in dockerfile Morgan Funtowicz 2025-01-23 11:26:20 +0100
  • 83c1ea8f7d
    Building AOT. Nicolas Patry 2025-01-23 10:49:22 +0100
  • bd2ec03d53 backend(vllm): statically allocate LLMEngine Morgan Funtowicz 2025-01-22 22:15:33 +0100
  • 1dd346666a
    Clarify FP8-Marlin use on capability 8.9 (#2940) Daniël de Kok 2025-01-22 18:18:11 +0100
  • 5f17b51a9c
    Revert the flashinfer (this will fails). Nicolas Patry 2025-01-22 18:16:54 +0100
  • 8d05d6a62c
    Put back the attention impl. Nicolas Patry 2025-01-22 18:13:59 +0100
  • 6fe37d61d0
    Fixing the transformers backend. Nicolas Patry 2025-01-22 17:47:20 +0100
  • 859d2f0464
    Tmp branch to test transformers backend with 2.5.1 and TP>1 Nicolas Patry 2025-01-22 17:33:08 +0100
  • 61a0b95f63 feat: add NVIDIA A40 to known cards Nikolai Kolodziej 2025-01-22 17:07:30 +0100
  • f187e993b9 Clarify FP8-Marlin use on capability 8.9 Daniël de Kok 2025-01-22 15:50:40 +0000
  • 1d3c9beba8
    fix moe in quantization path (#2935) Wang, Yi 2025-01-22 21:36:15 +0800
  • 6d335ca7ce
    Remove modifications in Lock. new_minor_version Nicolas Patry 2025-01-22 13:37:17 +0100
  • b21d3c1e73
    Upgrade the version number. Nicolas Patry 2025-01-22 12:29:50 +0100
  • 2dfe3b3ee6
    Upgrading the deps to have transformers==4.48.0 necessary (#2937) Nicolas Patry 2025-01-22 12:20:15 +0100
  • 0736c8c8b9
    Upgrading the deps to have transformers==4.48.0 necessary Nicolas Patry 2025-01-22 12:09:28 +0100
  • fd88b1d6b9
    llava next image encoder to allow un-aligned patch / image sizes Jiayu Liu 2025-01-22 17:09:59 +0800
  • fd0d628a59 fix moe in quantization path update ipex xpu to support moe for mixtral Wang, Yi A 2025-01-21 23:34:44 -0800
  • a7e5179f10 backend(trtllm): attempt to use ADD instead of RUN for openmpi Morgan Funtowicz 2025-01-21 23:40:45 +0100
  • cfd22726c9 backend(vllm): initial commit Morgan Funtowicz 2025-01-21 23:37:56 +0100
  • e958cab0c1
    Merge branch 'main' into fix-max-completion-tokens Alvaro Bartolome 2025-01-21 17:58:28 +0100
  • 64a33c1f05
    Run pre-commit run --all-files to fix CI (#2933) Alvaro Bartolome 2025-01-21 17:33:33 +0100
  • 5836fee2d0
    Run pre-commit run --all-files to fix CI Alvaro Bartolome 2025-01-21 17:02:11 +0100
  • dbadea98a2
    Set alias for max_completion_tokens in ChatRequest Alvaro Bartolome 2025-01-21 16:39:56 +0100
  • ebfe9d9f50 backend(trtllm): reenable shallow clone Morgan Funtowicz 2025-01-21 15:23:25 +0100
  • 7c1c587b38 backend(trtllm): move to nvidia remote instead of hf Morgan Funtowicz 2025-01-21 15:18:59 +0100
  • 10f713bcb6 backend(trtllm): use tag instead Morgan Funtowicz 2025-01-21 15:13:01 +0100
  • 9eff8dd33b backend(trtllm): do not use shallow clone Morgan Funtowicz 2025-01-21 14:59:30 +0100
  • dc564aa022 backend(trtllm): update to 0.16.0 Morgan Funtowicz 2025-01-21 14:46:21 +0100
  • bdb3e488e4
    Trying to avoid the random timeout. (#2929) Nicolas Patry 2025-01-21 11:06:10 +0100
  • 17367438f3
    Give TensorRT-LLMa proper CI/CD 😍 (#2886) Funtowicz Morgan 2025-01-21 10:19:16 +0100
  • 63c64bb307
    Use the default value in globals.py (#265) Yuan Wu 2025-01-21 17:10:23 +0800
  • 8de110ae9f
    Fix warmup with SKIP_TOKENIZER_IN_TGI=true (#266) Karol Damaszke 2025-01-21 10:09:49 +0100
  • 7d106477d6
    Fix router input validation for SKIP_TOKENIZER_IN_TGI=true (#267) Yuan Wu 2025-01-21 17:08:53 +0800
  • b980848abf
    Flash Transformers modeling backend support (#2913) Cyril Vallez 2025-01-21 10:01:51 +0100
  • a0e75b1311 misc(ci): attempt to fix sccache not building trtllm again Morgan Funtowicz 2025-01-21 00:19:39 +0100
  • a4d069fe07 misc(ci): attempt to fix sccache not building trtllm Morgan Funtowicz 2025-01-20 23:12:45 +0100
  • edfafeb46c misc(ci): fix warnings Morgan Funtowicz 2025-01-20 22:44:31 +0100
  • d0b8e2eb25 misc(ci): give everything aws needs Morgan Funtowicz 2025-01-20 21:12:24 +0100
  • 0d9ec75f27
    oupsi Cyril Vallez 2025-01-20 18:42:12 +0100
  • 70ada578b9
    check for non-native models Cyril Vallez 2025-01-20 18:01:12 +0100
  • 374493f830
    Wat? Nicolas Patry 2025-01-20 17:57:58 +0100
  • 93e343e11d
    Remove the dummy test, only increase the read timeout. Nicolas Patry 2025-01-20 17:40:58 +0100
  • 1ee74e5512
    Remove legacy ENV directive. Nicolas Patry 2025-01-20 17:33:53 +0100
  • 4cc842e556
    Longer timeout ? Nicolas Patry 2025-01-20 17:26:58 +0100
  • 9c955105d8
    More read timeout ? Nicolas Patry 2025-01-20 16:47:35 +0100
  • 2ef3002c2b
    Update __init__.py Cyril Vallez 2025-01-20 16:37:41 +0100
  • c9f9cd165b
    Trying to avoid the random timeout. Nicolas Patry 2025-01-20 16:27:39 +0100
  • 6d9c011f51
    move the import to avoid device issue Cyril Vallez 2025-01-20 16:11:41 +0100
  • 9af3ea4b70
    device check Cyril Vallez 2025-01-20 15:55:31 +0100
  • 52afdcc281
    update comment Cyril Vallez 2025-01-20 15:25:10 +0100
  • 6e0f37c0ca
    revert + style + minor improvements Cyril Vallez 2025-01-20 15:13:24 +0100
  • 7c9ee5655f misc(ci): give everything aws needs Morgan Funtowicz 2025-01-20 15:10:14 +0100
  • 16162602c2 Add fp8 support moe models Mohit Sharma 2025-01-20 13:55:54 +0000