Daniël de Kok
bd33a23cac
More grpcio shenanigans
2025-07-08 15:00:50 +00:00
Daniël de Kok
df53facda9
AMD grpcio?
2025-07-08 14:15:41 +00:00
Daniël de Kok
a3db7edd67
Set grpcio upper bound to 1.73 (exclusive)
2025-07-08 13:55:00 +00:00
Daniël de Kok
5a6e09e32e
Revert "protobuf < 6.0"
...
This reverts commit 48bb4b4f1e
.
2025-07-08 13:53:13 +00:00
Daniël de Kok
48bb4b4f1e
protobuf < 6.0
2025-07-08 13:32:04 +00:00
Daniël de Kok
bfdaf5773c
Add outlines upper bound
...
Version 1.0.0 and later does not have fsm.guides module. We should
update against the 1.0.0 API later.
2025-07-08 13:28:51 +00:00
Yuan Wu
fc2405c549
[gaudi] Fix the CI test errors ( #3286 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-07-07 11:32:07 +02:00
Wang, Yi
ebb26f0ccd
[gaudi] Deepseek v2 mla and add ep to unquantized moe ( #3287 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-07-07 11:29:39 +02:00
Wang, Yi
778b61c0da
[gaudi] Remove unnecessary reinitialize to HeterogeneousNextTokenChooser to make sampling output correct ( #3284 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
2025-07-03 10:03:16 +02:00
David Corvoysier
3d2e7c8fce
Optimum neuron 0.2.2 ( #3281 )
...
* chore(neuron): use optimum-neuron 0.2.1
* test(neuron): adjust expectations
Since the latest optimum-neuron uses a new modeling for granite and
qwen, the greedy outputs are slighly different.
* test(neuron): add phi3 and qwen3 tests
* chore(neuron): use optimum-neuron 0.2.2
2025-07-03 07:59:25 +02:00
Wang, Yi
f6005d6813
xpu lora support ( #3232 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-07-02 17:54:25 +02:00
Wang, Yi
429dcd9c64
[gaudi] Gemma3 sliding window support ( #3280 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-07-01 10:06:01 +02:00
Baptiste Colle
9f38d93051
Gaudi: add CI ( #3160 )
...
Co-authored-by: Pauline Bailly-Masson <155966238+paulinebm@users.noreply.github.com>
2025-06-24 18:51:09 +02:00
Wang, Yi
719907410b
[gaudi] Refine rope memory, do not need to keep sin/cos cache per layer ( #3274 )
2025-06-23 11:15:39 +02:00
David Corvoysier
238fbd4d50
Neuron backend fix and patch version 3.3.4 ( #3273 )
...
* fix(neuron): wrong assertion when batch_size==1
* chore: prepare 3.3.4
2025-06-19 10:52:41 +02:00
Wang, Yi
14ee6e7804
[gaudi] gemma3 text and vlm model intial support. need to add sliding window support later ( #3270 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-06-19 09:32:34 +02:00
David Corvoysier
bd1bdebb47
doc: fix README ( #3271 )
2025-06-18 12:35:36 +02:00
regisss
f13e28c98d
[gaudi] Refine logging for Gaudi warmup ( #3222 )
...
* Refine logging for Gaudi warmup
* Make style
* Make style 2
* Flash causal LM case
* Add log_master & VLM cases
* Black
2025-06-18 12:34:00 +02:00
David Corvoysier
b4d17f18ff
chore: prepare release 3.3.3 ( #3269 )
2025-06-18 11:55:26 +02:00
Wang, Yi
0627983c17
[Gaudi] use pad_token_id to pad input id ( #3268 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-06-17 09:07:25 +02:00
Yuan Wu
3752143b39
[Gaudi] Fix the integration-test issues ( #3265 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-06-13 14:47:06 +02:00
Yuan Wu
ded4cb52ac
[Gaudi] Enable Qwen3_moe model ( #3244 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-06-13 12:03:24 +02:00
Wang, Yi
a220e57f45
[gaudi] HuggingFaceM4/idefics2-8b issue fix ( #3264 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-06-13 12:00:08 +02:00
Yuan Wu
e07056ab3f
[Gaudi] Remove optimum-habana ( #3261 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-06-12 22:35:36 +02:00
Yuan Wu
25fdc5f03c
[gaudi] Move the _update_cos_sin_cache into get_cos_sin ( #3254 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-06-12 22:31:11 +02:00
Wang, Yi
613b8dd647
[gaudi] Vlm rebase and issue fix in benchmark test ( #3263 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-06-12 22:26:37 +02:00
Wang, Yi
839477670a
[gaudi] Perf optimization ( #3256 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-06-11 15:00:21 +02:00
David Corvoysier
79183d1647
Bump neuron SDK version ( #3260 )
...
* chore(neuron): bump version to 0.2.0
* refactor(neuron): use named parameters in inputs helpers
This allows to hide the differences between the two backends in terms of
input parameters.
* refactor(neuron): remove obsolete code paths
* fix(neuron): use neuron_config whenever possible
* fix(neuron): use new cache import path
* fix(neuron): neuron config is not stored in config anymore
* fix(nxd): adapt model retrieval to new APIs
* fix(generator): emulate greedy in sampling parameters
When on-device sampling is enabled, we need to emulate the greedy
behaviour using top-k=1, top-p=1, temperature=1.
* test(neuron): update models and expectations
* feat(neuron): support on-device sampling
* fix(neuron): adapt entrypoint
* tests(neuron): remove obsolete models
* fix(neuron): adjust test expectations for llama on nxd
2025-06-10 17:56:25 +02:00
Yuan Wu
1ff9d185d5
Remove useless packages ( #3253 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-06-03 13:42:29 +02:00
Daniël de Kok
249189d96e
Prepare for 3.3.2 ( #3249 )
2025-05-30 16:16:36 +02:00
Yuan Wu
6b6e30a6f6
[gaudi] Fix the Llama-4-Maverick-17B-128E crash issue ( #3246 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-05-29 11:38:44 +02:00
Yuan Wu
70217ac345
[Gaudi] Fix the OOM issue of Llama-4-Scout-17B-16E-Instruct ( #3245 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-05-29 09:58:24 +02:00
Wang, Yi
f14044009a
fp8 compressed tensors w8a8 support for Gaudi backend ( #3242 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-28 14:54:20 +02:00
Yuan Wu
1883a62a94
Add Qwen3 for Gaudi backend ( #3229 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-05-23 08:58:35 +02:00
Daniël de Kok
f58d7cf50e
Nix: switch to hf-nix ( #3240 )
...
* Nix: switch to hf-nix
* Remove outdated local overrides
2025-05-22 17:09:15 +02:00
Wang, Yi
f08b44ade5
Upgrade to new vllm extension ops for Gaudi backend (fix issue in exponential bucketing) ( #3239 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-22 15:29:16 +02:00
Daniël de Kok
674c514d44
Prepare for 3.3.1 ( #3238 )
2025-05-22 09:43:55 +02:00
Wang, Yi
9e7e546923
Move input_ids to hpu and remove disposal of adapter_meta ( #3237 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-22 09:21:31 +02:00
Daniël de Kok
e32528792c
Switch to punica-sgmv kernel from the Hub ( #3236 )
...
* Switch to punica-sgmv kernel from the Hub
This also switches (temporarily) to the tgi-nix/kernel-builder merge
branch, bumping up to CUDA 12.8 (same as non-Nix Torch).
* nix: client depends on aiohttp
This probably worked before the nixpkgs bump because a dependency
propagated aiohttp.
2025-05-21 15:44:15 +02:00
Wang, Yi
43b1b07fb9
Fix the crash in default ATTENTION path for Gaudi backend ( #3235 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-20 14:02:32 +02:00
Wang, Yi
000e313a92
Refine warmup and upgrade to synapse AI 1.21.0 ( #3234 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-20 10:22:43 +02:00
Wang, Yi
d658b5def3
Deepseek R1 for Gaudi backend ( #3211 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-19 16:36:39 +02:00
drbh
58934c8b61
fix: count gpu uuids if NVIDIA_VISIBLE_DEVICES env set to all ( #3230 )
2025-05-16 11:48:58 -04:00
Yuan Wu
18cbecfb38
Enable Llama4 for Gaudi backend ( #3223 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-05-15 14:35:37 +02:00
Daniël de Kok
7e531f413d
Update to Torch 2.7.0 ( #3221 )
...
* Update to Torch 2.7.0
* Try to fix typer/click issue
* Pin click to fix incompatibility with typer
* Fix some test outputs with slight deviations
* Attempt again to sync with CI
* Mamba too
* Fixup mllama
Also switch to `unsloth/Llama-3.2-11B-Vision-Instruct` for testing
from the EU :).
2025-05-15 11:48:33 +02:00
kaixuanliu
535ce23827
Adjust the round_up_seq
logic in Gaudi backend ( #3224 )
...
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
2025-05-12 09:58:43 +02:00
kaixuanliu
c94f415af4
Change HPU warmup logic: seq length should be with exponential growth ( #3217 )
...
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
2025-05-10 15:41:18 +02:00
Daniël de Kok
56c8189467
Prepare for 3.3.0 ( #3220 )
2025-05-09 15:50:29 +02:00
Mohit Sharma
329f612e55
Chunked Prefill VLM ( #3188 )
...
* add logic
* working
* add encoder cache free
* fixes
* fix idefics
* update pixel_values
* add improvements
* add improvements
* improve
* nit
* fix inputs_embeds
* nit
* optimizations
* add prometheus port
* rename vars
* rename vars
* nit
* disable chunking for qwen
* review comments
* remove port
* improve headdim
* remove kwargs and redundant args
* fix qwen2_5
* fix config image_token_id error
* fix test
* update paligemma
* fix paligemma text
* minor fix
* fix qwen test
* fix qwen test
2025-05-06 18:01:59 +02:00
Wang, Yi
533eee50dc
forward and tokenize chooser use the same shape ( #3196 )
...
* forward and tokenize chooser use the same shape
concate or filter happened to cpu tensor to avoid dynamic shape in hpu
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* use hpu set seed
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-06 10:49:32 +02:00