Commit Graph

  • 1d58110508 Try to fix typer/click issue Daniël de Kok 2025-05-12 09:55:06 +0000
  • 9751577efa Update to Torch 2.7.0 Daniël de Kok 2025-05-12 09:08:34 +0000
  • 535ce23827
    Adjust the round_up_seq logic in Gaudi backend (#3224) kaixuanliu 2025-05-12 15:58:43 +0800
  • f728cf69f2 move all_input_ids_tensor to hpu to improve perf for large bs in sharded mode Wang, Yi A 2025-05-12 00:55:04 -0700
  • 9414bcca0f Fix the RotaryEmbedding yuanwu 2025-05-11 23:39:42 +0000
  • f0acbbf10c Fix the import error yuanwu 2025-05-11 23:05:37 +0000
  • cfcbd80fb4 Install new transformers yuanwu 2025-05-11 22:36:18 +0000
  • f5aaa18d8e Fix the image_token_id issue yuanwu 2025-05-11 22:11:42 +0000
  • 50ecfc625a refine yuanwu 2025-05-11 20:45:40 +0000
  • 7533b993d5 Fix the errors of pre-commit yuanwu 2025-05-11 18:31:35 +0000
  • d98116db6e Remove yuanwu 2025-05-11 18:23:23 +0000
  • 4e95db304f Remove unnecessary modifications yuanwu 2025-05-11 18:17:15 +0000
  • 3aa882337e Clean the code yuanwu 2025-05-11 17:53:26 +0000
  • f0dac1dec8 Clean the code yuanwu 2025-05-11 16:44:53 +0000
  • a27039bc51 Fix the image-to-text accuray issue yuanwu 2025-05-11 15:21:15 +0000
  • 3245b8972a Merge branch 'main' into add_logs_gaudi_warmup regisss 2025-05-11 09:59:20 +0000
  • 4ee34f64c6 Make style 2 regisss 2025-05-10 17:04:32 +0000
  • afbebe6990 Make style regisss 2025-05-10 13:43:38 +0000
  • c94f415af4
    Change HPU warmup logic: seq length should be with exponential growth (#3217) kaixuanliu 2025-05-10 21:41:18 +0800
  • 966606d717 Cast rounded sequence to int regisss 2025-05-10 13:38:26 +0000
  • 2b2b4a814d Refine logging for Gaudi warmup regisss 2025-05-10 12:59:36 +0000
  • 9c5ec4adca change HPU warmup logic: seq length should be with exponential growth Liu, Kaixuan 2025-05-09 13:59:35 -0400
  • 03a8b8d751 Release 3.3.0 v3.3.0 git_3.3.0 Daniël de Kok 2025-05-09 13:53:38 +0000
  • 56c8189467
    Prepare for 3.3.0 (#3220) Daniël de Kok 2025-05-09 15:50:29 +0200
  • 249ccfc939 update to latest vllm extension ops Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Wang, Yi A 2025-05-08 23:43:41 -0700
  • e06645af64 Prepare for 3.3.0 Daniël de Kok 2025-05-09 12:14:51 +0000
  • 6c25a98b49 Prepare for 3.2.4 release-3.2.4 Daniël de Kok 2025-05-09 09:53:25 +0000
  • 1a5ff1dc5f Fix the accuracy issue yuanwu 2025-05-08 08:29:59 +0000
  • a3967a57bc Fix experts issue yuanwu 2025-05-08 03:12:22 +0000
  • 2dadceaf07 Debug accuracy issue yuanwu 2025-05-05 14:38:34 +0000
  • dafc597a8b Add save yuanwu 2025-05-05 10:08:29 +0000
  • ccddbba752 Fix crash yuanwu 2025-05-04 09:28:02 +0000
  • 3482d7ca82 Enable llama4 yuanwu 2025-04-30 23:42:45 +0000
  • 316cb087f3 if limit is set, all prefill will bypass graph Wang, Yi A 2025-05-07 06:07:20 -0700
  • ff5bc1bbd1 refine free memory and bypass graph logic Wang, Yi A 2025-05-05 23:26:39 -0700
  • 1cda91135e fp8 kv cache Wang, Yi A 2025-05-05 17:44:28 -0700
  • 2007269fe7 lazy mode Wang, Yi A 2025-05-03 20:54:58 -0700
  • 3db50ed9d3 add ep Wang, Yi A 2025-05-03 19:19:34 -0700
  • debf477ba4 enable deepseek_r1 Wang, Yi A 2025-04-28 23:07:34 -0700
  • c1c8e50a0e
    Merge f72547c9fb into 329f612e55 Funtowicz Morgan 2025-05-07 04:14:38 +0000
  • 329f612e55
    Chunked Prefill VLM (#3188) Mohit Sharma 2025-05-06 21:31:59 +0530
  • 533eee50dc
    forward and tokenize chooser use the same shape (#3196) Wang, Yi 2025-05-06 16:49:32 +0800
  • 51a0b9d11c
    IPEX support FP8 kvcache/softcap/slidingwindow (#3144) Wang, Yi 2025-05-06 16:49:24 +0800
  • f208ba6afc
    Fix HF_HUB_OFFLINE=1 for Gaudi backend (#3193) regisss 2025-05-06 02:47:53 -0600
  • 0cd6ff7a3d
    fix typos omahs 2025-05-06 10:40:48 +0200
  • 551ee3a365 fix: linter support-logit-bias-in-chat drbh 2025-05-06 00:03:17 +0000
  • 783ca66926 fix: prefer patch to be vlm specific drbh 2025-05-06 00:02:38 +0000
  • b32cd97b71 fix: read vocab size from tokenizer and add hacky patch for qwen2b drbh 2025-05-05 23:39:24 +0000
  • 55d82d4654 fix: remove the bias padding drbh 2025-05-05 21:45:18 +0000
  • 7659925d85 fix: improve validation and transform logic drbh 2025-05-05 13:59:02 -0400
  • 465294d3de fix: avoid zero'd logit bias mask drbh 2025-04-30 20:12:05 +0000
  • b3ead6e959 fix: cleanup typos drbh 2025-04-28 13:59:23 +0000
  • 9eeccbf9a5 fix: improve processor logic and refactor drbh 2025-04-28 13:44:37 +0000
  • bb5c875f0b fix: remove deprecated test and fix typing drbh 2025-04-23 17:06:22 +0000
  • 81656bd016 fix: adjust imports drbh 2025-04-22 17:07:28 +0000
  • 61a50a81c0 fix: include logit_bias in all ValidGenerateRequest's drbh 2025-04-22 16:46:59 +0000
  • e44703d542 fix: adjust the NextTokenChooser logit bias processor drbh 2025-04-22 16:36:34 +0000
  • da3f18e5c8 feat: include proto changes drbh 2025-04-22 16:26:56 +0000
  • fae510b8f6 feat: support logit bias in chat request drbh 2025-04-22 16:09:32 +0000
  • f15e8808b2
    Merge 3d71c06aff into 7253be349a Daniël de Kok 2025-05-02 09:48:27 +0200
  • 7253be349a
    Update client SDK snippets (#3207) Julien Chaumond 2025-05-01 17:10:51 +0200
  • fc2631547e
    good catch from copilot Julien Chaumond 2025-05-01 16:40:51 +0200
  • bb6fc3205c
    Update client SDK snippets Julien Chaumond 2025-05-01 16:36:09 +0200
  • d303c1e37e
    fix: bump snaps for mllama (#3202) drbh 2025-05-01 10:20:45 -0400
  • 12ea8d74c7
    Pr 2982 ci branch (#3046) drbh 2025-05-01 10:17:16 -0400
  • 6afe4307ab
    doc typo (#3206) Julien Chaumond 2025-05-01 14:31:48 +0200
  • 8f91108977
    typo Julien Chaumond 2025-05-01 13:20:29 +0200
  • 40dfce644a
    Skip {% generation %} and {% endgeneration %} template handling (#3204) Alvaro Bartolome 2025-05-01 12:13:17 +0200
  • d79dd8c87d
    Revert "Add .DS_Store file to .gitignore" Alvaro Bartolome 2025-05-01 12:07:24 +0200
  • 36b45c2d60
    Update explanation on {% generation %} and {% endgeneration %} removal Alvaro Bartolome 2025-05-01 11:53:23 +0200
  • 54cc24b3c9
    Skip {% generation %} and {% endgeneration %} Alvaro Bartolome 2025-05-01 11:14:21 +0200
  • d64d6d2f7f
    Add .DS_Store file to .gitignore Alvaro Bartolome 2025-05-01 10:34:11 +0200
  • 98dd275104 fix: bump snaps for mllama drbh 2025-04-30 23:17:32 +0000
  • db2541ccd5 fix: bump test snaps drbh 2025-04-30 23:08:15 +0000
  • fd6bafe37d fix: adjust test payload drbh 2025-04-30 18:02:01 +0000
  • 1d63285cf9 fix: bump openapi doc with new grammar option drbh 2025-03-17 14:51:33 +0000
  • 1b374af695 feat: support json_schema grammar constraining and add tests drbh 2025-03-14 18:13:03 +0000
  • 364fa62bfe fix: another end-of-file-fixer lint drbh 2025-02-20 23:46:04 +0000
  • 08944740ee fix: add test snapshots and avoid docs change drbh 2025-02-20 23:45:15 +0000
  • 943901e4a0 fix: end-of-file-fixer lint drbh 2025-02-20 18:39:15 -0500
  • 2c37ec3797 fix: various linter adjustments drbh 2025-02-20 21:18:16 +0000
  • c206bd0f6e Add tests for all aliases Alex Weston 2025-01-30 14:11:05 -0500
  • 75fbca5e1a Add json_schema alias for GrammarType Alex Weston 2025-01-30 14:03:54 -0500
  • 338cdc2eb8
    Tiny fix. kvrouter Nicolas Patry 2025-04-30 17:50:14 +0200
  • 3aa8564652
    Fixing the Trie in case of exact prefix match split. Nicolas Patry 2025-04-30 17:33:17 +0200
  • 6a5955a78c fix qwen test Mohit Sharma 2025-04-30 10:08:55 +0000
  • 996473164a fix qwen test Mohit Sharma 2025-04-30 09:57:22 +0000
  • d1cf64abc4 minor fix Mohit Sharma 2025-04-30 08:55:38 +0000
  • 5cfd4b168a fix paligemma text Mohit Sharma 2025-04-30 07:47:14 +0000
  • 70c616ca27 feat: lock updated kernel versions bump-kernel-versions drbh 2025-04-29 11:05:41 -0400
  • e7329fec18
    Fixing the router + template for Qwen3. (#3200) Nicolas Patry 2025-04-29 16:29:26 +0200
  • 6bc624c90c
    Fixing the router + template for Qwen3. Nicolas Patry 2025-04-29 12:34:36 +0200
  • dac5a29165 Merge branch 'main' into ipex_fp8_kv_cache Wang, Yi A 2025-04-28 22:10:26 -0700
  • 61ccbf6bbd update paligemma Mohit Sharma 2025-04-28 13:02:40 +0000
  • 534a16d50c fix test Mohit Sharma 2025-04-28 09:40:23 +0000
  • babba0a5e5 use hpu set seed Wang, Yi A 2025-04-28 01:32:45 -0700
  • 3e9f5ba159 Format regisss 2025-04-28 07:50:15 +0000
  • 2394437dc7
    fix(docker): Set uv install dir Sebastian Liebscher 2025-04-27 18:56:48 +0200
  • 0e45bca624 forward and tokenize chooser use the same shape concate or filter happened to cpu tensor to avoid dynamic shape in hpu Wang, Yi A 2025-04-26 21:56:39 -0700
  • 790a3b5ed2 Fix HF cache default value in server.rs regisss 2025-04-25 17:08:34 +0000