Commit Graph

  • b094f026c1 chore(neuron): bump version to 0.2.0 David Corvoysier 2025-05-22 14:35:18 +0000
  • c065c58818 Remove Optimum-habana yuanwu 2025-06-10 05:09:56 +0000
  • 2204f91f32 fix: adjust llava logic and bump snaps support-granite-vision drbh 2025-06-06 14:54:10 +0000
  • c43954d44c fix multi-modality apply template issue Wang, Yi A 2025-06-04 20:33:53 -0700
  • d5ba5f54f6 Use the max_position_embeddings yuanwu 2025-06-06 07:22:40 +0000
  • 1505d4687a Remove useless modifications yuanwu 2025-06-06 07:06:19 +0000
  • 4a89f59ec7 Remove useless modification yuanwu 2025-06-06 06:46:06 +0000
  • eed58b77c3 Remove debug info yuanwu 2025-06-06 06:17:45 +0000
  • dbb24255c3 fix multi-modality concatenate Wang, Yi A 2025-06-05 23:14:15 -0700
  • 7f346a88e3 Fix the crash issue of Qwen/Qwen3-235B-A22B yuanwu 2025-06-06 06:14:01 +0000
  • acc02aeb3e set block mapping inside model graph Wang, Yi A 2025-06-03 23:49:29 -0700
  • 30bdf922bd feat: improve llava next pooling for granite vision drbh 2025-06-04 13:50:39 +0000
  • 1a5ef906ae Remove debug info yuanwu 2025-06-03 05:28:38 +0000
  • 8b9a503f8a Move the _update_cos_sin_cache into get_cos_sin yuanwu 2025-06-04 03:00:23 +0000
  • 79ee5135e3 remove unnecessage input_id pad Wang, Yi A 2025-06-02 23:47:23 -0700
  • 1ff9d185d5
    Remove useless packages (#3253) Yuan Wu 2025-06-03 19:42:29 +0800
  • 151d6638d3 avoid reshape of all_input_ids_tensor Wang, Yi A 2025-06-02 22:17:31 -0700
  • d2e6e863a4 Remove useless packages yuanwu 2025-05-30 03:21:16 +0000
  • 8e41da951d Release 3.3.2 v3.3.2 git_3.3.2 Daniël de Kok 2025-05-30 14:19:18 +0000
  • 249189d96e
    Prepare for 3.3.2 (#3249) Daniël de Kok 2025-05-30 16:16:36 +0200
  • 7063adf2f5 Prepare for 3.3.2 Daniël de Kok 2025-05-30 11:18:50 +0000
  • 97f305b28f
    Merge 1cb904e619 into 6b6e30a6f6 Jim Burtoft 2025-05-29 17:11:32 +0200
  • 6b6e30a6f6
    [gaudi] Fix the Llama-4-Maverick-17B-128E crash issue (#3246) Yuan Wu 2025-05-29 17:38:44 +0800
  • b1b79bf32d Fix the Llama-4-Maverick-17B-128E crash issue yuanwu 2025-05-29 08:37:25 +0000
  • 70217ac345
    [Gaudi] Fix the OOM issue of Llama-4-Scout-17B-16E-Instruct (#3245) Yuan Wu 2025-05-29 15:58:24 +0800
  • fb104d8b42 Fix the OOM issue of Llama-4-Scout-17B-16E-Instruct yuanwu 2025-05-29 06:38:45 +0000
  • 5155fef477
    Merge branch 'main' into qwen3_moe Yuan Wu 2025-05-29 13:05:31 +0800
  • f14044009a
    fp8 compressed tensors w8a8 support for Gaudi backend (#3242) Wang, Yi 2025-05-28 20:54:20 +0800
  • f147f10ed4 remove install of ipex Wang, Yi A 2025-05-27 22:38:02 -0700
  • 475f6e21bc add multi-weight for GPTQ weight loader Wang, Yi A 2025-05-26 23:21:59 -0700
  • e72d2574c8 use xccl Wang, Yi A 2025-05-26 20:22:03 -0700
  • ce8978f9ea remove print Wang, Yi A 2025-05-25 18:56:41 -0700
  • a2934644b8 Merge branch 'main' into fp8_compressor Wang, Yi A 2025-05-25 18:55:35 -0700
  • fab395b41f perf(trtllm): reduce futile loop iterations Tzu-Yu Lee 2025-05-25 22:07:54 +0800
  • f7bd82a90e feat(trtllm): get more accurate start time Tzu-Yu Lee 2025-05-25 17:40:45 +0800
  • 41819d70f7 fix(trtllm): fix do_sample being ignored Tzu-Yu Lee 2025-05-18 18:22:02 +0800
  • 4ffa111fb0 fp8 compressed_tensors w8a8 support Wang, Yi A 2025-05-22 21:48:04 -0700
  • 1883a62a94
    Add Qwen3 for Gaudi backend (#3229) Yuan Wu 2025-05-23 14:58:35 +0800
  • 45d95bdccc
    Merge branch 'huggingface:main' into qwen3_moe Yuan Wu 2025-05-23 10:26:57 +0800
  • cc3f6127ef Remove debug modification yuanwu 2025-05-22 18:53:49 +0300
  • 5e1d1bf174 Cannot use the latest transformers yuanwu 2025-05-22 18:09:31 +0300
  • f58d7cf50e
    Nix: switch to hf-nix (#3240) Daniël de Kok 2025-05-22 17:09:15 +0200
  • 5aafd37d7b Remove outdated local overrides Daniël de Kok 2025-05-22 14:27:55 +0000
  • 1ccf86ce84 Use the 4.52.2 transformers yuanwu 2025-05-22 17:01:07 +0300
  • 6c1d9f1377 Nix: switch to hf-nix Daniël de Kok 2025-05-22 13:34:39 +0000
  • f08b44ade5
    Upgrade to new vllm extension ops for Gaudi backend (fix issue in exponential bucketing) (#3239) Wang, Yi 2025-05-22 21:29:16 +0800
  • abaa99ebaa upgrade to new vllm extension ops(fix issue in exponential bucketing) Wang, Yi A 2025-05-22 01:23:04 -0700
  • 767a65202d Release 3.3.1 v3.3.1 git_3.3.1 Daniël de Kok 2025-05-22 07:47:12 +0000
  • 674c514d44
    Prepare for 3.3.1 (#3238) Daniël de Kok 2025-05-22 09:43:55 +0200
  • 9e7e546923
    Move input_ids to hpu and remove disposal of adapter_meta (#3237) Wang, Yi 2025-05-22 15:21:31 +0800
  • 346b6f7219 Use the latest transformers yuanwu 2025-05-22 05:45:45 +0300
  • 2e8d3e91ea Add mark_step into llama4 yuanwu 2025-05-22 07:20:21 +0300
  • ad41abd68c Add mark_step into qwen3 yuanwu 2025-05-22 07:17:49 +0300
  • 3d20c79007
    Merge branch 'huggingface:main' into qwen3 Yuan Wu 2025-05-22 11:56:29 +0800
  • 3338b34ba4 lora enable in xpu Wang, Yi A 2025-05-21 18:24:04 -0700
  • c20d0827db Prepare for 3.3.1 Daniël de Kok 2025-05-21 13:55:55 +0000
  • e32528792c
    Switch to punica-sgmv kernel from the Hub (#3236) Daniël de Kok 2025-05-21 15:44:15 +0200
  • 1495616b8b nix: client depends on aiohttp Daniël de Kok 2025-05-21 09:54:05 +0000
  • 40a4f9b5ea Switch to punica-sgmv kernel from the Hub Daniël de Kok 2025-05-21 08:31:00 +0000
  • b7ab3d3da7 move input_ids to hpu and remove disposal of adapter_meta Wang, Yi A 2025-05-20 23:28:46 -0700
  • 96535e8be8
    Merge 70c616ca27 into 43b1b07fb9 drbh 2025-05-21 06:04:21 +0200
  • 43b1b07fb9
    Fix the crash in default ATTENTION path for Gaudi backend (#3235) Wang, Yi 2025-05-20 20:02:32 +0800
  • 8209cb90b2 fix the crash in default ATTENTION path Wang, Yi A 2025-05-20 04:41:11 -0700
  • 000e313a92
    Refine warmup and upgrade to synapse AI 1.21.0 (#3234) Wang, Yi 2025-05-20 16:22:43 +0800
  • 2a014786e4 Remove debug log yuanwu 2025-05-20 02:31:36 +0000
  • 05b6ed1bff Fix num_key_value_heads issue yuanwu 2025-05-20 02:29:12 +0000
  • a5e889d037 update to 1.21 Wang, Yi A 2025-05-19 18:01:04 -0700
  • ae0c9dfb62 enable VLLM_EXPONENTIAL_BUCKETING Wang, Yi A 2025-05-18 19:56:11 -0700
  • 550c85c39e refine warm up Wang, Yi A 2025-05-17 02:37:43 -0700
  • d658b5def3
    Deepseek R1 for Gaudi backend (#3211) Wang, Yi 2025-05-19 22:36:39 +0800
  • b32b78e74e Fix crash issue yuanwu 2025-05-19 01:39:48 +0000
  • 8275bdcfe9 Fix? gaudi_llama4_tmp regisss 2025-05-18 22:04:41 +0000
  • c18766afec allocate from 1 block in router Wang, Yi A 2025-05-18 06:42:11 -0700
  • 56dd0a09e6 feat(trtllm): check existence of config files Tzu-Yu Lee 2025-05-18 03:25:13 +0800
  • 987337bf31 feat(trtllm): catch broader exception Tzu-Yu Lee 2025-05-18 02:49:35 +0800
  • 27d03309c9 feat(trtllm): add stop sequence support Tzu-Yu Lee 2025-05-18 02:37:19 +0800
  • 0858af206f fix(trtllm): fix segfault when canceling request Tzu-Yu Lee 2025-05-18 02:22:53 +0800
  • cc4b5848b9 fix: fix prometheus_port CLI short arg conflict Tzu-Yu Lee 2025-05-13 00:05:56 +0800
  • c458d21d07 feat(trtllm): add new finish reasons Tzu-Yu Lee 2025-05-11 03:27:59 +0800
  • 58934c8b61
    fix: count gpu uuids if NVIDIA_VISIBLE_DEVICES env set to all (#3230) drbh 2025-05-16 11:48:58 -0400
  • 80b43a9974 fix: count gpu uuids if NVIDIA_VISIBLE_DEVICES env set to all drbh 2025-05-16 15:01:39 +0000
  • becf36f5e4 fmt Wang, Yi A 2025-05-16 06:07:56 -0700
  • b5e1ae9209 minor fix Wang, Yi A 2025-05-15 23:34:48 -0700
  • a184ce3876 mixtral moe fix after upgrade vllm extension ops git Wang, Yi A 2025-05-15 19:22:57 -0700
  • 8c182415c2 Enable the qwen3 MOE yuanwu 2025-05-16 01:40:22 +0000
  • 638714f964 Add Qwen3 yuanwu 2025-05-13 07:42:22 +0000
  • d704b0c852 Add Qwen3 yuanwu 2025-05-13 07:42:22 +0000
  • f93ed958e4 Update Transformers requirement regisss 2025-05-15 20:04:26 +0000
  • 18cbecfb38
    Enable Llama4 for Gaudi backend (#3223) Yuan Wu 2025-05-15 20:35:37 +0800
  • 7e531f413d
    Update to Torch 2.7.0 (#3221) Daniël de Kok 2025-05-15 11:48:33 +0200
  • 9281be20c0 accelerate warmup Wang, Yi A 2025-05-14 19:16:00 -0700
  • d859dd36b7 Fixup mllama Daniël de Kok 2025-05-14 13:54:06 +0000
  • c9b6478b14 Mamba too Daniël de Kok 2025-05-14 13:17:47 +0000
  • 74ded00ecb Attempt again to sync with CI Daniël de Kok 2025-05-14 09:58:34 +0000
  • b2bd163d19
    Mla deepspeek (#2) Wang, Yi 2025-05-13 22:42:46 +0800
  • 6d4e98dae3 Fix some test outputs with slight deviations Daniël de Kok 2025-05-13 09:04:50 +0000
  • e14a451d8d Add the latest transformers yuanwu 2025-05-13 01:38:18 +0000
  • 4128953df3
    Merge 0cd6ff7a3d into 535ce23827 omahs 2025-05-12 09:40:49 -0300
  • c264a42aa1 adjust the round_up_seq logit to align with prefill warmup phase on HPU Liu, Kaixuan 2025-05-12 07:21:33 -0400
  • f7b7d435bf Pin click to fix incompatibility with typer Daniël de Kok 2025-05-12 11:00:21 +0000