Commit Graph

1435 Commits

Author SHA1 Message Date
Phil
5739b5b088
Add missing backslash (#3311) 2025-09-06 09:50:14 +02:00
drbh
356de85c29
feat: bump flake including transformers and huggingface_hub versions (#3313)
* feat: bump flake including transformers and huggingface_hub versions

* fix: adjust outline version in overlay
2025-09-02 09:46:41 -04:00
Alvaro Moran
0f79162288
chore: prepare version 3.3.5 (#3314)
* chore: prepare version 3.3.5

* black

* neuron: black

* Update hf-xet in uv lockfile

* Attempt to fix API doc check failure

Add `error_type` where missing.

* Pin redocly version

* Sync redocly with Nix for now

---------

Co-authored-by: Daniël de Kok <me@danieldk.eu>
2025-09-02 15:35:42 +02:00
Daniël de Kok
06d9d88b95
Disable Cachix pushes (#3312)
* Disable Cachix pushes

This is not safe until we have sandboxed builds. For TGI alone
this might not be a huge issue, but with Cachix caching disabled
in hf-nix, TGI CI would build all the packages and push it to
our cache.

* fix: bump docs

---------

Co-authored-by: drbh <david.richard.holtz@gmail.com>
2025-08-26 13:27:57 -04:00
Alvaro Moran
8801ba12cf
Optimum neuron 0.3.0 (#3308)
* chore(neuron): update to optimum-neuron 0.3.0

Dependencies were changed accordingly, because Neuron SDK was updated to
v2.24.

* test: sample is not deterministic

Also modify the temperature in decode test to avoid granite early
stopping.

* test(neuron): adjust expectations after graph changes

* test(neuron): use greedy for stop sequences

---------

Co-authored-by: David Corvoysier <david@huggingface.co>
2025-08-26 11:07:47 +02:00
Wang, Yi
d618424d50
HuggingFaceM4/Idefics3-8B-Llama3 crash fix (#3267)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-08-21 10:04:30 +02:00
Wang, Yi
c5e6f9a178
Fix outline import issue (#3282)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-08-21 09:53:04 +02:00
Wang, Yi
6624fec1f9
Some gptq case could not be handled by ipex. but could be handle by triton (#3298)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-08-19 09:37:49 +02:00
Wang, Yi
5284b5c654
Multi modality fix (#3283)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-08-19 09:36:36 +02:00
Wang, Yi
6a2fa83540
XCCL for XPU (#3252)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-08-19 00:37:27 +02:00
Emmanuel Ferdman
b4386b8c77
Migrate to V2 Pydantic interface (#3262)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-08-18 23:55:21 +02:00
Wang, Yi
24c2bff659
Gaudi gptq gidx support (#3297)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-07-17 16:00:12 +02:00
Yuan Wu
fc2405c549
[gaudi] Fix the CI test errors (#3286)
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-07-07 11:32:07 +02:00
Wang, Yi
ebb26f0ccd
[gaudi] Deepseek v2 mla and add ep to unquantized moe (#3287)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-07-07 11:29:39 +02:00
Wang, Yi
778b61c0da
[gaudi] Remove unnecessary reinitialize to HeterogeneousNextTokenChooser to make sampling output correct (#3284)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
2025-07-03 10:03:16 +02:00
David Corvoysier
3d2e7c8fce
Optimum neuron 0.2.2 (#3281)
* chore(neuron): use optimum-neuron 0.2.1

* test(neuron): adjust expectations

Since the latest optimum-neuron uses a new modeling for granite and
qwen, the greedy outputs are slighly different.

* test(neuron): add phi3 and qwen3 tests

* chore(neuron): use optimum-neuron 0.2.2
2025-07-03 07:59:25 +02:00
Wang, Yi
f6005d6813
xpu lora support (#3232)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-07-02 17:54:25 +02:00
Wang, Yi
429dcd9c64
[gaudi] Gemma3 sliding window support (#3280)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-07-01 10:06:01 +02:00
Baptiste Colle
9f38d93051
Gaudi: add CI (#3160)
Co-authored-by: Pauline Bailly-Masson <155966238+paulinebm@users.noreply.github.com>
2025-06-24 18:51:09 +02:00
Wang, Yi
719907410b
[gaudi] Refine rope memory, do not need to keep sin/cos cache per layer (#3274) 2025-06-23 11:15:39 +02:00
David Corvoysier
238fbd4d50
Neuron backend fix and patch version 3.3.4 (#3273)
* fix(neuron): wrong assertion when batch_size==1

* chore: prepare 3.3.4
2025-06-19 10:52:41 +02:00
Wang, Yi
14ee6e7804
[gaudi] gemma3 text and vlm model intial support. need to add sliding window support later (#3270)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-06-19 09:32:34 +02:00
David Corvoysier
bd1bdebb47
doc: fix README (#3271) 2025-06-18 12:35:36 +02:00
regisss
f13e28c98d
[gaudi] Refine logging for Gaudi warmup (#3222)
* Refine logging for Gaudi warmup

* Make style

* Make style 2

* Flash causal LM case

* Add log_master & VLM cases

* Black
2025-06-18 12:34:00 +02:00
David Corvoysier
b4d17f18ff
chore: prepare release 3.3.3 (#3269) 2025-06-18 11:55:26 +02:00
Wang, Yi
0627983c17
[Gaudi] use pad_token_id to pad input id (#3268)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-06-17 09:07:25 +02:00
Yuan Wu
3752143b39
[Gaudi] Fix the integration-test issues (#3265)
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-06-13 14:47:06 +02:00
Yuan Wu
ded4cb52ac
[Gaudi] Enable Qwen3_moe model (#3244)
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-06-13 12:03:24 +02:00
Wang, Yi
a220e57f45
[gaudi] HuggingFaceM4/idefics2-8b issue fix (#3264)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-06-13 12:00:08 +02:00
Yuan Wu
e07056ab3f
[Gaudi] Remove optimum-habana (#3261)
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-06-12 22:35:36 +02:00
Yuan Wu
25fdc5f03c
[gaudi] Move the _update_cos_sin_cache into get_cos_sin (#3254)
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-06-12 22:31:11 +02:00
Wang, Yi
613b8dd647
[gaudi] Vlm rebase and issue fix in benchmark test (#3263)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-06-12 22:26:37 +02:00
Wang, Yi
839477670a
[gaudi] Perf optimization (#3256)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-06-11 15:00:21 +02:00
David Corvoysier
79183d1647
Bump neuron SDK version (#3260)
* chore(neuron): bump version to 0.2.0

* refactor(neuron): use named parameters in inputs helpers

This allows to hide the differences between the two backends in terms of
input parameters.

* refactor(neuron): remove obsolete code paths

* fix(neuron): use neuron_config whenever possible

* fix(neuron): use new cache import path

* fix(neuron): neuron config is not stored in config anymore

* fix(nxd): adapt model retrieval to new APIs

* fix(generator): emulate greedy in sampling parameters

When on-device sampling is enabled, we need to emulate the greedy
behaviour using top-k=1, top-p=1, temperature=1.

* test(neuron): update models and expectations

* feat(neuron): support on-device sampling

* fix(neuron): adapt entrypoint

* tests(neuron): remove obsolete models

* fix(neuron): adjust test expectations for llama on nxd
2025-06-10 17:56:25 +02:00
Yuan Wu
1ff9d185d5
Remove useless packages (#3253)
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-06-03 13:42:29 +02:00
Daniël de Kok
249189d96e
Prepare for 3.3.2 (#3249) 2025-05-30 16:16:36 +02:00
Yuan Wu
6b6e30a6f6
[gaudi] Fix the Llama-4-Maverick-17B-128E crash issue (#3246)
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-05-29 11:38:44 +02:00
Yuan Wu
70217ac345
[Gaudi] Fix the OOM issue of Llama-4-Scout-17B-16E-Instruct (#3245)
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-05-29 09:58:24 +02:00
Wang, Yi
f14044009a
fp8 compressed tensors w8a8 support for Gaudi backend (#3242)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-28 14:54:20 +02:00
Yuan Wu
1883a62a94
Add Qwen3 for Gaudi backend (#3229)
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-05-23 08:58:35 +02:00
Daniël de Kok
f58d7cf50e
Nix: switch to hf-nix (#3240)
* Nix: switch to hf-nix

* Remove outdated local overrides
2025-05-22 17:09:15 +02:00
Wang, Yi
f08b44ade5
Upgrade to new vllm extension ops for Gaudi backend (fix issue in exponential bucketing) (#3239)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-22 15:29:16 +02:00
Daniël de Kok
674c514d44
Prepare for 3.3.1 (#3238) 2025-05-22 09:43:55 +02:00
Wang, Yi
9e7e546923
Move input_ids to hpu and remove disposal of adapter_meta (#3237)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-22 09:21:31 +02:00
Daniël de Kok
e32528792c
Switch to punica-sgmv kernel from the Hub (#3236)
* Switch to punica-sgmv kernel from the Hub

This also switches (temporarily) to the tgi-nix/kernel-builder merge
branch, bumping up to CUDA 12.8 (same as non-Nix Torch).

* nix: client depends on aiohttp

This probably worked before the nixpkgs bump because a dependency
propagated aiohttp.
2025-05-21 15:44:15 +02:00
Wang, Yi
43b1b07fb9
Fix the crash in default ATTENTION path for Gaudi backend (#3235)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-20 14:02:32 +02:00
Wang, Yi
000e313a92
Refine warmup and upgrade to synapse AI 1.21.0 (#3234)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-20 10:22:43 +02:00
Wang, Yi
d658b5def3
Deepseek R1 for Gaudi backend (#3211)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-19 16:36:39 +02:00
drbh
58934c8b61
fix: count gpu uuids if NVIDIA_VISIBLE_DEVICES env set to all (#3230) 2025-05-16 11:48:58 -04:00
Yuan Wu
18cbecfb38
Enable Llama4 for Gaudi backend (#3223)
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-05-15 14:35:37 +02:00