Commit Graph

  • 419ecd0167 fix qwen2_5 Mohit Sharma 2025-04-24 14:08:23 +0000
  • 8c782858bb
    Pre commit Nicolas Patry 2025-04-24 15:51:01 +0200
  • 3bb514ddd8 remove kwargs and redundant args Mohit Sharma 2025-04-24 13:33:22 +0000
  • d7a609d4ad
    Fixing the makefile by using lockfile. Nicolas Patry 2025-04-24 15:30:51 +0200
  • 90989a4a04
    Put more wiggle room. Nicolas Patry 2025-04-24 14:48:47 +0200
  • 36c5ec2abe improve headdim Mohit Sharma 2025-04-24 09:55:14 +0000
  • b86a73d72b remove port Mohit Sharma 2025-04-24 09:52:17 +0000
  • 8015f5f258 Merge branch 'main' into add_vlm_chunking_optimized Mohit Sharma 2025-04-24 09:50:44 +0000
  • d58ec388bf review comments Mohit Sharma 2025-04-24 09:49:29 +0000
  • 375802948d
    Warmup gaudi backend (#3172) Wang, Yi 2025-04-24 15:57:08 +0800
  • 02715dc53f
    Add option to configure prometheus port (#3187) Mohit Sharma 2025-04-23 20:43:25 +0530
  • 67c51d7c5e
    Fixing format after rebase. Nicolas Patry 2025-04-23 12:23:39 +0200
  • 1cbda4f541
    add port for trtllm and llamacpp Mohit Sharma 2025-04-23 10:12:22 +0000
  • 12b1cf89cf
    fix doc Mohit Sharma 2025-04-23 07:38:52 +0000
  • e38c296b94
    add prometheus port Mohit Sharma 2025-04-22 12:44:15 +0000
  • 15926210d3 disable chunking for qwen Mohit Sharma 2025-04-23 08:09:51 +0000
  • dd91b60998 nit Mohit Sharma 2025-04-22 14:41:20 +0000
  • f1da19df41 rename vars Mohit Sharma 2025-04-22 13:54:39 +0000
  • 63ddba24b4 rename vars Mohit Sharma 2025-04-22 12:46:36 +0000
  • 136b9897d4 add prometheus port Mohit Sharma 2025-04-22 12:44:15 +0000
  • 6545cdde0d optimizations Mohit Sharma 2025-04-22 07:49:45 +0000
  • 2f67c53075
    nit Mohit Sharma 2025-04-22 02:06:57 +0530
  • 26212b9f35
    fix inputs_embeds Mohit Sharma 2025-04-22 02:03:34 +0530
  • f34b06ca3b
    nit Mohit Sharma 2025-04-22 01:58:00 +0530
  • 46ff016490
    improve Mohit Sharma 2025-04-22 01:40:42 +0530
  • 6ed540b52f add improvements Mohit Sharma 2025-04-21 15:28:18 +0000
  • be8e60a918 add improvements Mohit Sharma 2025-04-21 15:25:03 +0000
  • 7237e8e6bf update pixel_values add_vlm_chunking Mohit Sharma 2025-04-19 17:12:23 +0000
  • 52e4186c2a fix idefics Mohit Sharma 2025-04-19 14:39:24 +0000
  • b86919a87a fixes Mohit Sharma 2025-04-19 10:26:56 +0000
  • 526a8785ed add encoder cache free Mohit Sharma 2025-04-18 16:00:35 +0000
  • 44ed5efbcc working Mohit Sharma 2025-04-18 14:57:37 +0000
  • 8f8819795f
    Fixing CI (#3184) Nicolas Patry 2025-04-18 13:07:18 +0200
  • f17367e883
    Fixing CI Nicolas Patry 2025-04-18 12:48:07 +0200
  • 95ccba3705
    Bump sccache to 0.10.0 (#3179) Alvaro Bartolome 2025-04-18 12:45:32 +0200
  • 92909f3f33
    add logic Mohit Sharma 2025-04-18 12:37:40 +0530
  • b400c275e4
    Get opentelemetry trace id from request headers instead of creating a new trace (#2648) Hyeongchan Kim 2025-04-18 16:06:41 +0900
  • 5d14a7fe3d
    Merge branch 'main' into feature/get-trace-id-from-req-headers Nicolas Patry 2025-04-18 09:05:56 +0200
  • 84ab88d843
    Support flashinfer for Gemma3 prefill (#3167) Daniël de Kok 2025-04-17 18:07:41 +0200
  • 516b2d1c1d Pending changes exported from your codespace DIVINEDP 2025-04-17 08:08:54 +0000
  • 417c18c5cd Initial commit DIVINEDP 2025-04-17 08:08:53 +0000
  • 83e7e21b4c
    Rename ACTIONS_CACHE_URL to ACTIONS_RESULTS_URL Alvaro Bartolome 2025-04-16 10:51:49 +0200
  • 6620f564b6
    Ensure that sccache version is 0.10.0 or higher Alvaro Bartolome 2025-04-16 10:51:30 +0200
  • a7aff220e0
    Merge fd92054e1d into 4645678ff0 Curtis Ruck 2025-04-16 17:11:23 +0900
  • 01f17d526c Merge branch 'main' into warmup_gaudi_backend Wang, Yi A 2025-04-15 22:16:42 -0700
  • bf3987e25e pingpong optimization issue fix Wang, Yi A 2025-04-15 21:56:51 -0700
  • 4645678ff0
    Hotfix gaudi2 with newer transformers. (#3176) Nicolas Patry 2025-04-15 12:39:28 +0200
  • cedb5f07c0
    Hotfix gaudi2 with newer transformers. Nicolas Patry 2025-04-15 12:27:22 +0200
  • ad765cd06b
    Hotfixing gaudi deps. (#3174) Nicolas Patry 2025-04-15 11:55:28 +0200
  • 5bb27a1d6b
    Hotfixing gaudi deps. Nicolas Patry 2025-04-15 11:54:26 +0200
  • 16b4b7974a
    Upgrading the dependencies in Gaudi backend. (#3170) Nicolas Patry 2025-04-15 11:49:06 +0200
  • 7e3f072ea4
    Upgrading transformers version. Nicolas Patry 2025-04-15 11:35:46 +0200
  • 459fbdebe3
    transformers flash llm/vlm enabling in ipex (#3152) Wang, Yi 2025-04-15 17:08:01 +0800
  • 302c773c99
    Merge 2a10a28d08 into 449cee49ca Mohit Sharma 2025-04-15 13:44:04 +0530
  • 449cee49ca
    setuptools <= 70.0 is vulnerable: CVE-2024-6345 (#3171) Nicolas Patry 2025-04-15 10:09:37 +0200
  • 5ec7f15d0c prefill bypass graph Wang, Yi A 2025-04-15 00:27:07 -0700
  • 6b21985c95 Merge branch 'main' into warmup_gaudi_backend Wang, Yi A 2025-04-14 18:24:34 -0700
  • 73e797528d
    L4 fixes (#3161) Mohit Sharma 2025-04-14 22:13:53 +0530
  • 487d0634ed
    setuptools <= 70.0 is vulnerable: CVE-2024-6345 Nicolas Patry 2025-04-14 17:27:39 +0200
  • fe56f760df
    Upgrading the python client deps (still deprecated, but used for integration-tests) Nicolas Patry 2025-04-14 17:18:43 +0200
  • 75e3ec5b84
    Upgrading the dependencies in Gaudi backend. Nicolas Patry 2025-04-14 16:51:21 +0200
  • d62c941c56
    Gaudi: clean cuda/rocm code in hpu backend, enable flat_hpu (#3113) Wang, Yi 2025-04-14 21:58:13 +0800
  • 74ad8ed300 ipex cpu could also support in function Wang, Yi A 2025-04-13 20:49:35 -0700
  • ce8548f5c4 softcap default -1.0 Wang, Yi A 2025-04-13 20:02:05 -0700
  • ba049c9d49 improve performance Wang, Yi A 2025-04-11 06:10:17 -0700
  • 9f0f41835f Fixed unused import Daniël de Kok 2025-04-11 18:15:21 +0000
  • c03f8d2bb1 Update Gemma3 test outputs Daniël de Kok 2025-04-11 16:05:26 +0000
  • 6652d6e6e0 Support flashinfer for Gemma3 prefill Daniël de Kok 2025-04-11 15:58:57 +0000
  • a9b26b221a launcher: ensure correct detection of Gemma 3 head size Daniël de Kok 2025-04-11 11:56:18 +0000
  • 2a10a28d08 force attn to flashdecoding add_chunked_atn Mohit Sharma 2025-04-11 15:24:12 +0000
  • a7353c35e8 fix bt Mohit Sharma 2025-04-11 15:10:19 +0000
  • d2f8caff2b support cuda graphs Mohit Sharma 2025-04-11 15:05:28 +0000
  • fd92054e1d
    Fix state.plan call to use positional arguments Curtis Ruck 2025-04-11 10:09:24 -0400
  • 3d71c06aff flashinfer: head_dim -> head_dim_qk flashinfer-0.2.5 Daniël de Kok 2025-04-11 12:37:21 +0000
  • e893362ad7 Update to flashinfer 0.2.5 Daniël de Kok 2025-04-11 10:24:48 +0000
  • 76cc129796 remove block_scales which is not needed anymore Wang, Yi A 2025-04-11 01:27:49 -0700
  • a83e9fe003 work with the latest vllm extension ops Wang, Yi A 2025-04-10 19:56:58 -0700
  • 4de8fb0127 Merge branch 'gaudi_backend_pa' into warmup_gaudi_backend Wang, Yi A 2025-04-10 19:42:22 -0700
  • 4cdc34ec4d match the latest vllm_extension ops Wang, Yi A 2025-04-10 19:32:32 -0700
  • 610dd200e5 Merge branch 'main' into gaudi_backend_pa Wang, Yi A 2025-04-10 18:20:28 -0700
  • cd900c3b72 pingpong optimization Wang, Yi A 2025-04-08 19:56:10 -0700
  • 3f343cdb6f reverse flash causal change Mohit Sharma 2025-04-10 15:03:44 +0000
  • 33a7ec57e2 add fix Mohit Sharma 2025-04-10 14:59:39 +0000
  • 517e4398c2 add fix Mohit Sharma 2025-04-10 13:21:11 +0000
  • 73d0876f12
    Fixing the updating logic of backends. kvrouter-endpoints Nicolas Patry 2025-04-10 11:04:03 +0200
  • 18cb4a4221
    Fixing add/remove/set backends. Nicolas Patry 2025-04-10 09:14:30 +0200
  • 9a8d0462e1
    Fixing tokenization like https://github.com/huggingface/text-embeddin… (#3156) Nicolas Patry 2025-04-09 18:42:25 +0200
  • d93ad244a3 add attn add_chunked_attn Mohit Sharma 2025-04-09 16:37:34 +0000
  • e5618d6e40 add chunked attn support chunked_attn_l4 Pedro Cuenca 2025-04-09 16:36:06 +0000
  • 5861da1ad7
    Fixing Qwen 2.5 VL (32B). (#3157) Nicolas Patry 2025-04-09 17:07:30 +0200
  • 33af4dcd6c
    Fixing Qwen 2.5 VL (32B). Nicolas Patry 2025-04-09 16:10:32 +0200
  • 0eb4bdc909
    Fixing tokenization like https://github.com/huggingface/text-embeddings-inference/issues/525 Nicolas Patry 2025-04-09 15:22:49 +0200
  • a2d2406ddd
    Update repo links Inferentia refer HF docs Guspan Tanadi 2025-04-09 14:23:51 +0700
  • f8c8c3d397 softcap default -1.0 Wang, Yi A 2025-04-08 22:42:03 -0700
  • 8d36856d57 install xelink lib Wang, Yi A 2025-04-08 20:42:28 -0700
  • 50282e3cc1 transformers flash llm/vlm enabling in xpu Wang, Yi A 2025-04-08 18:36:28 -0700
  • a1f3ebe17c
    Release 3.2.3 v3.2.3 git_v3.2.3 Nicolas Patry 2025-04-08 10:17:51 +0200
  • 0b28aabb94
    3.2.3 (#3151) Nicolas Patry 2025-04-08 10:16:37 +0200
  • 68df6b19e4
    3.2.3 Nicolas Patry 2025-04-08 10:14:55 +0200
  • 5831ff6e69 remove useless rwlock Corentin REGAL 2025-04-08 09:40:45 +0200