Wang, Yi A
4cdc34ec4d
match the latest vllm_extension ops
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-04-10 19:32:32 -07:00
Wang, Yi A
610dd200e5
Merge branch 'main' into gaudi_backend_pa
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-04-10 18:20:28 -07:00
Nicolas Patry
9a8d0462e1
Fixing tokenization like https://github.com/huggingface/text-embeddin … ( #3156 )
...
Fixing tokenization like https://github.com/huggingface/text-embeddings-inference/issues/525
2025-04-09 18:42:25 +02:00
Nicolas Patry
5861da1ad7
Fixing Qwen 2.5 VL (32B). ( #3157 )
...
Reduce the config constraints, and use common ground between the 8B and
32B.
2025-04-09 17:07:30 +02:00
Nicolas Patry
0b28aabb94
3.2.3 ( #3151 )
2025-04-08 10:16:37 +02:00
oOraph
24bec29ffc
fix: compute type typo ( #3150 )
...
Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>
Co-authored-by: Raphael Glon <oOraph@users.noreply.github.com>
2025-04-07 17:24:11 +02:00
Baptiste Colle
37104acd75
Gaudi: Add Integration Test for Gaudi Backend ( #3142 )
...
* feat(gaudi): add integration test
* feat(test): add more models to integration tests
* remove debug comments
* fix typos
2025-04-07 16:55:03 +02:00
Mohit Sharma
87a0af4ec2
Update transformers to 4.51 ( #3148 )
...
* update transformres
* Upgrading the nix deps too.
* Forcing torchvision to be in there.
* Fixing bug in mllama.
* Those tests cannot be run in CI.
* Lint.
---------
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2025-04-07 12:55:43 +02:00
Mohit Sharma
9c26b52940
Use ROCM 6.3.1 ( #3141 )
...
* update dockerfile
* add updated makefile
* fix docker
* Lint.
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2025-04-07 12:55:11 +02:00
Nicolas Patry
d23b385eee
Preparing for release. ( #3147 )
...
* Preparing for release.
* Adding hf-xet dependency.
* Merged tgi-nix update.
2025-04-06 11:36:00 +02:00
Mohit Sharma
d9bb9bebc9
Add llama4 ( #3145 )
...
* initial changes
* Add support for other vlm
* cleanup comment
* Improve attn_implementation
* Add comments for support of models
* add model
* add model
* fixes and improvements
* update docker
* Add cache position
* Add tests
* remove redundant changes
* remove tr version
* Upgrade doc + fix linting.
* Fixing the CI.
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2025-04-06 10:20:22 +02:00
Yuan Wu
3d059f91ab
Gaudi: Use exponential growth to replace BATCH_BUCKET_SIZE ( #3131 )
...
* Gaudi: Use exponential growth to replace BATCH_BUCKET_SIZE
Signed-off-by: yuanwu <yuan.wu@intel.com>
* Remove debug modifications
Signed-off-by: yuanwu <yuan.wu@intel.com>
---------
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-04-03 10:34:53 +02:00
Wang, Yi A
c55a8caea2
remove torch.where to fix incorrect output in hpu graph model
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-31 22:51:54 -07:00
Wang, Yi A
f0e5faec1a
fix some issue
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-28 07:01:06 -07:00
Wang, Yi A
376e0507b7
missing gptj change...
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-28 01:08:40 -07:00
Wang, Yi A
787dbe98a8
fix comment
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-28 00:09:26 -07:00
Wang, Yi A
7914e980e2
Merge branch 'main' into gaudi_backend_pa
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-28 00:03:49 -07:00
Wang, Yi A
1508ee8de1
remove block_tables and prefill_cache_indices which will lead to dynamic shape
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-27 23:57:59 -07:00
Wang, Yi A
7900be5ac3
warmup decode
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-26 20:19:13 -07:00
Wang, Yi A
ba7a131e04
add warmup_decode
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-26 17:39:26 -07:00
Corentin REGAL
0142550096
nix-v3.2.1 -> v3.2.1-nix ( #3129 )
...
make it easier to check for version using semver semantic (same major
and minor)
2025-03-26 15:36:43 +01:00
Wang, Yi A
fd70ad703e
warmup prefill
...
remove model where pageattn is not used, set block table to None since it's not used
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-26 03:10:58 -07:00
Yuan Wu
f5f14dc660
Gaudi: Fix llava-next and mllama crash issue ( #3127 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-03-25 15:08:15 +01:00
Wang, Yi A
69773767c5
enable fp8
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-25 05:06:55 -07:00
Nicolas Patry
54d15462dc
Torch 2.6 ( #3134 )
...
* Torch 2.6
* Upgrade the toolchain.
* Don't upgrade just yet.
* Upgrade toolchain.
* Time upgrade.
* TGI-nix main.
* Upgrade to transformers 4.50
2025-03-24 11:55:49 +01:00
Wang, Yi A
8d221b7b79
fix gptq issue
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-22 20:58:50 -07:00
Wang, Yi A
9914ffe1f1
remove unused quantization code and enable awq/gptq int4
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-22 19:37:20 -07:00
Wang, Yi A
fdf0733f56
fix incorrect output in qwen2 idefics if hpu graph is used
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-21 01:01:37 -07:00
Wang, Yi A
36b6612f97
adjust warmup and enable vlm
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-20 23:12:52 -07:00
Baptiste Colle
2e60a8dd65
CI: enable server tests for backends ( #3128 )
...
add test for backends
2025-03-20 16:07:31 +01:00
Erik Kaunismäki
e5503eba78
configurable termination timeout ( #3126 )
...
* make shard and webserver termination timeouts configurable
* Updating documentation.
* Fmt.
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2025-03-20 14:25:56 +01:00
Wang, Yi A
f95aa42660
multi-modality initial PR
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-19 23:30:12 -07:00
Wang, Yi A
d5b78ba16f
Merge branch 'main' into gaudi_backend_pa
2025-03-19 18:15:08 -07:00
Wang, Yi A
2074d0516b
enable dbrx remove some unused code
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-19 03:16:41 -07:00
Wang, Yi A
2cde30de24
gpt_bigcode could also go pageattn
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-18 23:59:31 -07:00
Wang, Yi A
073f793976
fix phimoe issue
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-18 23:11:01 -07:00
Nicolas Patry
e497bc09f6
Minor fixes. ( #3125 )
2025-03-18 15:42:35 +01:00
Nicolas Patry
67ce543e04
Intel docker. ( #3121 )
...
* Intel docker.
* torchaudio ?
* Fixing dockerfile ?
2025-03-18 15:12:11 +01:00
Nicolas Patry
83fe45c15e
Prepare for patch release. ( #3124 )
2025-03-18 15:11:55 +01:00
Nicolas Patry
11f2eec10e
Publish nix docker image. ( #3122 )
...
* Publish nix docker image.
* Run during PR.
* Something else.
* Forgot to push.
* Build zstd.
* Pushing with skopeo
* Testing the PR.
* Runnign from nix.
* Cleaner tags.
2025-03-18 12:58:21 +01:00
Mohit Sharma
a35fbdb925
Bug Fix: Sliding Window Attention ( #3112 )
...
* (fix) sliding window attention
* (fix) flashinfer
* (typo) collection link
* Add window_size_left param ipex rocm
* Update window size rocm flash decoding
* fix: bump snapshots and improve exceed window test case
* feat: add tests for image types and remove alpha from png
* Upgrading `from_env` to get token from file when necessary + fix
pali_gemma.
* fix: add pillow dependency and bump lock+requirements
* fix: bump org name in gemma3 test
* Fix qwen2.
---------
Co-authored-by: drbh <david.richard.holtz@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2025-03-18 10:37:33 +01:00
Baptiste Colle
8c2c348f3c
Gaudi: Sync TGI with the latest changes from the TGI-Gaudi fork ( #3117 )
...
feat(gaudi): add all the changes from tgi-gaudi fork up to PR #289
2025-03-18 09:45:52 +01:00
Wang, Yi A
5cd1c93cad
add moe support, fix qwen/mistral/mixtral crash
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-18 00:45:15 -07:00
Daniël de Kok
095775e05c
launcher: correctly get the head dimension for VLMs ( #3116 )
...
* launcher: correctly get the head dimension for VLMs
For most (?) VLMs, the head dimension is in the `text_config`
configuration section. However, since we only queried the top-level
`head_dim` (which typically doesn't exist in VLMs), we would never use
flashinfer. This change adds a method that gets the head dimension from
the top-level `Config` struct or `text_config` when that fails.
* fix: bump org name in gemma3 test
---------
Co-authored-by: drbh <david.richard.holtz@gmail.com>
2025-03-17 18:19:37 +01:00
Wang, Yi
0b3e3db043
xpu 2.6 update ( #3051 )
...
* xpu 2.6 update
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* install whl
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* update get xpu memory api
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* int
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* fix awq crash if modules_to_not_convert is None
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-17 13:48:48 +01:00
Wang, Yi A
6bbe24d974
use tensor cache in hpu graph to avoid replay issue
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-17 01:36:49 -07:00
Wang, Yi A
a07e7437b6
enable all the model. not testet yet
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-17 01:26:32 -07:00
Wang, Yi A
5d3653943c
adjust block table in hpu to improve performance
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-16 20:28:01 -07:00
Wang, Yi A
b7fea6fc2f
fix TP in pageattn
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-14 18:01:58 -07:00
Wang, Yi A
201dc6294f
clean cuda/rocm code in hpu backend, enable flat_hpu
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-14 01:25:31 -07:00