Alvaro Moran
8801ba12cf
Optimum neuron 0.3.0 ( #3308 )
...
* chore(neuron): update to optimum-neuron 0.3.0
Dependencies were changed accordingly, because Neuron SDK was updated to
v2.24.
* test: sample is not deterministic
Also modify the temperature in decode test to avoid granite early
stopping.
* test(neuron): adjust expectations after graph changes
* test(neuron): use greedy for stop sequences
---------
Co-authored-by: David Corvoysier <david@huggingface.co>
2025-08-26 11:07:47 +02:00
Wang, Yi
d618424d50
HuggingFaceM4/Idefics3-8B-Llama3 crash fix ( #3267 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-08-21 10:04:30 +02:00
Wang, Yi
c5e6f9a178
Fix outline import issue ( #3282 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-08-21 09:53:04 +02:00
Wang, Yi
6624fec1f9
Some gptq case could not be handled by ipex. but could be handle by triton ( #3298 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-08-19 09:37:49 +02:00
Wang, Yi
5284b5c654
Multi modality fix ( #3283 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-08-19 09:36:36 +02:00
Wang, Yi
6a2fa83540
XCCL for XPU ( #3252 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-08-19 00:37:27 +02:00
Emmanuel Ferdman
b4386b8c77
Migrate to V2 Pydantic interface ( #3262 )
...
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-08-18 23:55:21 +02:00
Wang, Yi
24c2bff659
Gaudi gptq gidx support ( #3297 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-07-17 16:00:12 +02:00
Yuan Wu
fc2405c549
[gaudi] Fix the CI test errors ( #3286 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-07-07 11:32:07 +02:00
Wang, Yi
ebb26f0ccd
[gaudi] Deepseek v2 mla and add ep to unquantized moe ( #3287 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-07-07 11:29:39 +02:00
Wang, Yi
778b61c0da
[gaudi] Remove unnecessary reinitialize to HeterogeneousNextTokenChooser to make sampling output correct ( #3284 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
2025-07-03 10:03:16 +02:00
David Corvoysier
3d2e7c8fce
Optimum neuron 0.2.2 ( #3281 )
...
* chore(neuron): use optimum-neuron 0.2.1
* test(neuron): adjust expectations
Since the latest optimum-neuron uses a new modeling for granite and
qwen, the greedy outputs are slighly different.
* test(neuron): add phi3 and qwen3 tests
* chore(neuron): use optimum-neuron 0.2.2
2025-07-03 07:59:25 +02:00
Wang, Yi
f6005d6813
xpu lora support ( #3232 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-07-02 17:54:25 +02:00
Wang, Yi
429dcd9c64
[gaudi] Gemma3 sliding window support ( #3280 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-07-01 10:06:01 +02:00
Baptiste Colle
9f38d93051
Gaudi: add CI ( #3160 )
...
Co-authored-by: Pauline Bailly-Masson <155966238+paulinebm@users.noreply.github.com>
2025-06-24 18:51:09 +02:00
Wang, Yi
719907410b
[gaudi] Refine rope memory, do not need to keep sin/cos cache per layer ( #3274 )
2025-06-23 11:15:39 +02:00
David Corvoysier
238fbd4d50
Neuron backend fix and patch version 3.3.4 ( #3273 )
...
* fix(neuron): wrong assertion when batch_size==1
* chore: prepare 3.3.4
2025-06-19 10:52:41 +02:00
Wang, Yi
14ee6e7804
[gaudi] gemma3 text and vlm model intial support. need to add sliding window support later ( #3270 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-06-19 09:32:34 +02:00
David Corvoysier
bd1bdebb47
doc: fix README ( #3271 )
2025-06-18 12:35:36 +02:00
regisss
f13e28c98d
[gaudi] Refine logging for Gaudi warmup ( #3222 )
...
* Refine logging for Gaudi warmup
* Make style
* Make style 2
* Flash causal LM case
* Add log_master & VLM cases
* Black
2025-06-18 12:34:00 +02:00
David Corvoysier
b4d17f18ff
chore: prepare release 3.3.3 ( #3269 )
2025-06-18 11:55:26 +02:00
Wang, Yi
0627983c17
[Gaudi] use pad_token_id to pad input id ( #3268 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-06-17 09:07:25 +02:00
Yuan Wu
3752143b39
[Gaudi] Fix the integration-test issues ( #3265 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-06-13 14:47:06 +02:00
Yuan Wu
ded4cb52ac
[Gaudi] Enable Qwen3_moe model ( #3244 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-06-13 12:03:24 +02:00
Wang, Yi
a220e57f45
[gaudi] HuggingFaceM4/idefics2-8b issue fix ( #3264 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-06-13 12:00:08 +02:00
Yuan Wu
e07056ab3f
[Gaudi] Remove optimum-habana ( #3261 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-06-12 22:35:36 +02:00
Yuan Wu
25fdc5f03c
[gaudi] Move the _update_cos_sin_cache into get_cos_sin ( #3254 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-06-12 22:31:11 +02:00
Wang, Yi
613b8dd647
[gaudi] Vlm rebase and issue fix in benchmark test ( #3263 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-06-12 22:26:37 +02:00
Wang, Yi
839477670a
[gaudi] Perf optimization ( #3256 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-06-11 15:00:21 +02:00
David Corvoysier
79183d1647
Bump neuron SDK version ( #3260 )
...
* chore(neuron): bump version to 0.2.0
* refactor(neuron): use named parameters in inputs helpers
This allows to hide the differences between the two backends in terms of
input parameters.
* refactor(neuron): remove obsolete code paths
* fix(neuron): use neuron_config whenever possible
* fix(neuron): use new cache import path
* fix(neuron): neuron config is not stored in config anymore
* fix(nxd): adapt model retrieval to new APIs
* fix(generator): emulate greedy in sampling parameters
When on-device sampling is enabled, we need to emulate the greedy
behaviour using top-k=1, top-p=1, temperature=1.
* test(neuron): update models and expectations
* feat(neuron): support on-device sampling
* fix(neuron): adapt entrypoint
* tests(neuron): remove obsolete models
* fix(neuron): adjust test expectations for llama on nxd
2025-06-10 17:56:25 +02:00
Yuan Wu
1ff9d185d5
Remove useless packages ( #3253 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-06-03 13:42:29 +02:00
Daniël de Kok
249189d96e
Prepare for 3.3.2 ( #3249 )
2025-05-30 16:16:36 +02:00
Yuan Wu
6b6e30a6f6
[gaudi] Fix the Llama-4-Maverick-17B-128E crash issue ( #3246 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-05-29 11:38:44 +02:00
Yuan Wu
70217ac345
[Gaudi] Fix the OOM issue of Llama-4-Scout-17B-16E-Instruct ( #3245 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-05-29 09:58:24 +02:00
Wang, Yi
f14044009a
fp8 compressed tensors w8a8 support for Gaudi backend ( #3242 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-28 14:54:20 +02:00
Yuan Wu
1883a62a94
Add Qwen3 for Gaudi backend ( #3229 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-05-23 08:58:35 +02:00
Daniël de Kok
f58d7cf50e
Nix: switch to hf-nix ( #3240 )
...
* Nix: switch to hf-nix
* Remove outdated local overrides
2025-05-22 17:09:15 +02:00
Wang, Yi
f08b44ade5
Upgrade to new vllm extension ops for Gaudi backend (fix issue in exponential bucketing) ( #3239 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-22 15:29:16 +02:00
Daniël de Kok
674c514d44
Prepare for 3.3.1 ( #3238 )
2025-05-22 09:43:55 +02:00
Wang, Yi
9e7e546923
Move input_ids to hpu and remove disposal of adapter_meta ( #3237 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-22 09:21:31 +02:00
Daniël de Kok
e32528792c
Switch to punica-sgmv kernel from the Hub ( #3236 )
...
* Switch to punica-sgmv kernel from the Hub
This also switches (temporarily) to the tgi-nix/kernel-builder merge
branch, bumping up to CUDA 12.8 (same as non-Nix Torch).
* nix: client depends on aiohttp
This probably worked before the nixpkgs bump because a dependency
propagated aiohttp.
2025-05-21 15:44:15 +02:00
Wang, Yi
43b1b07fb9
Fix the crash in default ATTENTION path for Gaudi backend ( #3235 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-20 14:02:32 +02:00
Wang, Yi
000e313a92
Refine warmup and upgrade to synapse AI 1.21.0 ( #3234 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-20 10:22:43 +02:00
Wang, Yi
d658b5def3
Deepseek R1 for Gaudi backend ( #3211 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-19 16:36:39 +02:00
drbh
58934c8b61
fix: count gpu uuids if NVIDIA_VISIBLE_DEVICES env set to all ( #3230 )
2025-05-16 11:48:58 -04:00
Yuan Wu
18cbecfb38
Enable Llama4 for Gaudi backend ( #3223 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-05-15 14:35:37 +02:00
Daniël de Kok
7e531f413d
Update to Torch 2.7.0 ( #3221 )
...
* Update to Torch 2.7.0
* Try to fix typer/click issue
* Pin click to fix incompatibility with typer
* Fix some test outputs with slight deviations
* Attempt again to sync with CI
* Mamba too
* Fixup mllama
Also switch to `unsloth/Llama-3.2-11B-Vision-Instruct` for testing
from the EU :).
2025-05-15 11:48:33 +02:00
kaixuanliu
535ce23827
Adjust the round_up_seq
logic in Gaudi backend ( #3224 )
...
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-05-12 09:58:43 +02:00
kaixuanliu
c94f415af4
Change HPU warmup logic: seq length should be with exponential growth ( #3217 )
...
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
2025-05-10 15:41:18 +02:00
Daniël de Kok
56c8189467
Prepare for 3.3.0 ( #3220 )
2025-05-09 15:50:29 +02:00