Funtowicz Morgan
85790a19a7
misc(gha): expose action cache url and runtime as secrets ( #2964 )
...
* misc(gha): expose action cache url and runtime as secrets
* (CI): Move S3 Auth to OIDC
* Fix Typo
* change bucket name
* fix aws auth creds
* misc(gha): fix invalid syntax for secrets
* WIP: Add AWS session token
* Increase session time
* Remove actions_cache_url mount from Dockerfile
Removed an unused mount for actions_cache_url in the Dockerfile.
* WIP
---------
Co-authored-by: Guillaume LEGENDRE <glegendre01@gmail.com>
2025-11-17 10:50:10 +01:00
Alvaro Moran
efb94e0d3d
Patch version 3.3.6 ( #3329 )
...
* chore: prepare version 3.3.6
* fix(benchmark): clear up progress_gauge fn signature
Otherwise there is a compiler error.
2025-09-16 19:15:23 -04:00
drbh
5e747f4e30
Revert "feat: bump flake including transformers and huggingface_hub versions" ( #3330 )
...
Revert "feat: bump flake including transformers and huggingface_hub versions …"
This reverts commit 356de85c29 .
2025-09-16 11:32:19 -04:00
drbh
1b90c508af
Revert "Revert "feat: bump flake including transformers and huggingfa… ( #3326 )
...
Revert "Revert "feat: bump flake including transformers and huggingface_hub v…"
This reverts commit 9dedeb89ac .
2025-09-09 10:44:25 -04:00
Eliott C.
d2ad7c484e
Update iframe sources for streaming demo ( #3327 )
2025-09-09 15:36:19 +02:00
Daniël de Kok
c6071749db
Fix mask passed to flashinfer ( #3324 )
...
Custom masks are padded to the shape `[batch_size, max_len, max_len]`.
However, flashinfer expects an unpadded mask of the shape
`[sum(q_len[i] * k_len[i] for i in range(batch_size)]`.
This change unpads the custom mask (currently only used by Gemma 3)
to this shape (assuming q_len == k_len, since we only use the custom
mask during prefill).
2025-09-08 13:47:03 -04:00
drbh
4f067c22c3
fix: remove azure ( #3325 )
2025-09-08 13:41:45 -04:00
drbh
9dedeb89ac
Revert "feat: bump flake including transformers and huggingface_hub versions" ( #3323 )
...
Revert "feat: bump flake including transformers and huggingface_hub versions …"
This reverts commit 356de85c29 .
2025-09-08 12:17:29 +02:00
Phil
5739b5b088
Add missing backslash ( #3311 )
2025-09-06 09:50:14 +02:00
drbh
356de85c29
feat: bump flake including transformers and huggingface_hub versions ( #3313 )
...
* feat: bump flake including transformers and huggingface_hub versions
* fix: adjust outline version in overlay
2025-09-02 09:46:41 -04:00
Alvaro Moran
0f79162288
chore: prepare version 3.3.5 ( #3314 )
...
* chore: prepare version 3.3.5
* black
* neuron: black
* Update hf-xet in uv lockfile
* Attempt to fix API doc check failure
Add `error_type` where missing.
* Pin redocly version
* Sync redocly with Nix for now
---------
Co-authored-by: Daniël de Kok <me@danieldk.eu>
2025-09-02 15:35:42 +02:00
Daniël de Kok
06d9d88b95
Disable Cachix pushes ( #3312 )
...
* Disable Cachix pushes
This is not safe until we have sandboxed builds. For TGI alone
this might not be a huge issue, but with Cachix caching disabled
in hf-nix, TGI CI would build all the packages and push it to
our cache.
* fix: bump docs
---------
Co-authored-by: drbh <david.richard.holtz@gmail.com>
2025-08-26 13:27:57 -04:00
Alvaro Moran
8801ba12cf
Optimum neuron 0.3.0 ( #3308 )
...
* chore(neuron): update to optimum-neuron 0.3.0
Dependencies were changed accordingly, because Neuron SDK was updated to
v2.24.
* test: sample is not deterministic
Also modify the temperature in decode test to avoid granite early
stopping.
* test(neuron): adjust expectations after graph changes
* test(neuron): use greedy for stop sequences
---------
Co-authored-by: David Corvoysier <david@huggingface.co>
2025-08-26 11:07:47 +02:00
Wang, Yi
d618424d50
HuggingFaceM4/Idefics3-8B-Llama3 crash fix ( #3267 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-08-21 10:04:30 +02:00
Wang, Yi
c5e6f9a178
Fix outline import issue ( #3282 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-08-21 09:53:04 +02:00
Wang, Yi
6624fec1f9
Some gptq case could not be handled by ipex. but could be handle by triton ( #3298 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-08-19 09:37:49 +02:00
Wang, Yi
5284b5c654
Multi modality fix ( #3283 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-08-19 09:36:36 +02:00
Wang, Yi
6a2fa83540
XCCL for XPU ( #3252 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-08-19 00:37:27 +02:00
Emmanuel Ferdman
b4386b8c77
Migrate to V2 Pydantic interface ( #3262 )
...
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-08-18 23:55:21 +02:00
Wang, Yi
24c2bff659
Gaudi gptq gidx support ( #3297 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-07-17 16:00:12 +02:00
Yuan Wu
fc2405c549
[gaudi] Fix the CI test errors ( #3286 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-07-07 11:32:07 +02:00
Wang, Yi
ebb26f0ccd
[gaudi] Deepseek v2 mla and add ep to unquantized moe ( #3287 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-07-07 11:29:39 +02:00
Wang, Yi
778b61c0da
[gaudi] Remove unnecessary reinitialize to HeterogeneousNextTokenChooser to make sampling output correct ( #3284 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
2025-07-03 10:03:16 +02:00
David Corvoysier
3d2e7c8fce
Optimum neuron 0.2.2 ( #3281 )
...
* chore(neuron): use optimum-neuron 0.2.1
* test(neuron): adjust expectations
Since the latest optimum-neuron uses a new modeling for granite and
qwen, the greedy outputs are slighly different.
* test(neuron): add phi3 and qwen3 tests
* chore(neuron): use optimum-neuron 0.2.2
2025-07-03 07:59:25 +02:00
Wang, Yi
f6005d6813
xpu lora support ( #3232 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-07-02 17:54:25 +02:00
Wang, Yi
429dcd9c64
[gaudi] Gemma3 sliding window support ( #3280 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-07-01 10:06:01 +02:00
Baptiste Colle
9f38d93051
Gaudi: add CI ( #3160 )
...
Co-authored-by: Pauline Bailly-Masson <155966238+paulinebm@users.noreply.github.com>
2025-06-24 18:51:09 +02:00
Wang, Yi
719907410b
[gaudi] Refine rope memory, do not need to keep sin/cos cache per layer ( #3274 )
2025-06-23 11:15:39 +02:00
David Corvoysier
238fbd4d50
Neuron backend fix and patch version 3.3.4 ( #3273 )
...
* fix(neuron): wrong assertion when batch_size==1
* chore: prepare 3.3.4
2025-06-19 10:52:41 +02:00
Wang, Yi
14ee6e7804
[gaudi] gemma3 text and vlm model intial support. need to add sliding window support later ( #3270 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-06-19 09:32:34 +02:00
David Corvoysier
bd1bdebb47
doc: fix README ( #3271 )
2025-06-18 12:35:36 +02:00
regisss
f13e28c98d
[gaudi] Refine logging for Gaudi warmup ( #3222 )
...
* Refine logging for Gaudi warmup
* Make style
* Make style 2
* Flash causal LM case
* Add log_master & VLM cases
* Black
2025-06-18 12:34:00 +02:00
David Corvoysier
b4d17f18ff
chore: prepare release 3.3.3 ( #3269 )
2025-06-18 11:55:26 +02:00
Wang, Yi
0627983c17
[Gaudi] use pad_token_id to pad input id ( #3268 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-06-17 09:07:25 +02:00
Yuan Wu
3752143b39
[Gaudi] Fix the integration-test issues ( #3265 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-06-13 14:47:06 +02:00
Yuan Wu
ded4cb52ac
[Gaudi] Enable Qwen3_moe model ( #3244 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-06-13 12:03:24 +02:00
Wang, Yi
a220e57f45
[gaudi] HuggingFaceM4/idefics2-8b issue fix ( #3264 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-06-13 12:00:08 +02:00
Yuan Wu
e07056ab3f
[Gaudi] Remove optimum-habana ( #3261 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-06-12 22:35:36 +02:00
Yuan Wu
25fdc5f03c
[gaudi] Move the _update_cos_sin_cache into get_cos_sin ( #3254 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-06-12 22:31:11 +02:00
Wang, Yi
613b8dd647
[gaudi] Vlm rebase and issue fix in benchmark test ( #3263 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-06-12 22:26:37 +02:00
Wang, Yi
839477670a
[gaudi] Perf optimization ( #3256 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-06-11 15:00:21 +02:00
David Corvoysier
79183d1647
Bump neuron SDK version ( #3260 )
...
* chore(neuron): bump version to 0.2.0
* refactor(neuron): use named parameters in inputs helpers
This allows to hide the differences between the two backends in terms of
input parameters.
* refactor(neuron): remove obsolete code paths
* fix(neuron): use neuron_config whenever possible
* fix(neuron): use new cache import path
* fix(neuron): neuron config is not stored in config anymore
* fix(nxd): adapt model retrieval to new APIs
* fix(generator): emulate greedy in sampling parameters
When on-device sampling is enabled, we need to emulate the greedy
behaviour using top-k=1, top-p=1, temperature=1.
* test(neuron): update models and expectations
* feat(neuron): support on-device sampling
* fix(neuron): adapt entrypoint
* tests(neuron): remove obsolete models
* fix(neuron): adjust test expectations for llama on nxd
2025-06-10 17:56:25 +02:00
Yuan Wu
1ff9d185d5
Remove useless packages ( #3253 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-06-03 13:42:29 +02:00
Daniël de Kok
249189d96e
Prepare for 3.3.2 ( #3249 )
2025-05-30 16:16:36 +02:00
Yuan Wu
6b6e30a6f6
[gaudi] Fix the Llama-4-Maverick-17B-128E crash issue ( #3246 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-05-29 11:38:44 +02:00
Yuan Wu
70217ac345
[Gaudi] Fix the OOM issue of Llama-4-Scout-17B-16E-Instruct ( #3245 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-05-29 09:58:24 +02:00
Wang, Yi
f14044009a
fp8 compressed tensors w8a8 support for Gaudi backend ( #3242 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-28 14:54:20 +02:00
Yuan Wu
1883a62a94
Add Qwen3 for Gaudi backend ( #3229 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-05-23 08:58:35 +02:00
Daniël de Kok
f58d7cf50e
Nix: switch to hf-nix ( #3240 )
...
* Nix: switch to hf-nix
* Remove outdated local overrides
2025-05-22 17:09:15 +02:00
Wang, Yi
f08b44ade5
Upgrade to new vllm extension ops for Gaudi backend (fix issue in exponential bucketing) ( #3239 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-22 15:29:16 +02:00