Wang, Yi A
29703dbd27
fix warmup issue for mllama
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-04-04 20:25:01 -07:00
Wang, Yi A
8591687561
refine log and fix some issue
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-04-03 00:11:22 -07:00
Wang, Yi A
a84da5b698
optimize code
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-04-02 00:56:15 -07:00
Wang, Yi A
705cc0b619
multi-modality warmup
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-04-02 00:09:16 -07:00
Wang, Yi A
9d85ac9485
LLM warmup logic
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-31 23:07:14 -07:00
Wang, Yi A
c55a8caea2
remove torch.where to fix incorrect output in hpu graph model
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-31 22:51:54 -07:00
Wang, Yi A
f0e5faec1a
fix some issue
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-28 07:01:06 -07:00
Wang, Yi A
376e0507b7
missing gptj change...
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-28 01:08:40 -07:00
Wang, Yi A
787dbe98a8
fix comment
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-28 00:09:26 -07:00
Wang, Yi A
7914e980e2
Merge branch 'main' into gaudi_backend_pa
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-28 00:03:49 -07:00
Wang, Yi A
1508ee8de1
remove block_tables and prefill_cache_indices which will lead to dynamic shape
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-27 23:57:59 -07:00
Wang, Yi A
7900be5ac3
warmup decode
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-26 20:19:13 -07:00
Wang, Yi A
ba7a131e04
add warmup_decode
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-26 17:39:26 -07:00
Corentin REGAL
0142550096
nix-v3.2.1 -> v3.2.1-nix ( #3129 )
...
make it easier to check for version using semver semantic (same major
and minor)
2025-03-26 15:36:43 +01:00
Wang, Yi A
fd70ad703e
warmup prefill
...
remove model where pageattn is not used, set block table to None since it's not used
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-26 03:10:58 -07:00
Yuan Wu
f5f14dc660
Gaudi: Fix llava-next and mllama crash issue ( #3127 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2025-03-25 15:08:15 +01:00
Wang, Yi A
69773767c5
enable fp8
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-25 05:06:55 -07:00
Nicolas Patry
54d15462dc
Torch 2.6 ( #3134 )
...
* Torch 2.6
* Upgrade the toolchain.
* Don't upgrade just yet.
* Upgrade toolchain.
* Time upgrade.
* TGI-nix main.
* Upgrade to transformers 4.50
2025-03-24 11:55:49 +01:00
Wang, Yi A
8d221b7b79
fix gptq issue
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-22 20:58:50 -07:00
Wang, Yi A
9914ffe1f1
remove unused quantization code and enable awq/gptq int4
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-22 19:37:20 -07:00
Wang, Yi A
fdf0733f56
fix incorrect output in qwen2 idefics if hpu graph is used
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-21 01:01:37 -07:00
Wang, Yi A
36b6612f97
adjust warmup and enable vlm
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-20 23:12:52 -07:00
Baptiste Colle
2e60a8dd65
CI: enable server tests for backends ( #3128 )
...
add test for backends
2025-03-20 16:07:31 +01:00
Erik Kaunismäki
e5503eba78
configurable termination timeout ( #3126 )
...
* make shard and webserver termination timeouts configurable
* Updating documentation.
* Fmt.
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2025-03-20 14:25:56 +01:00
Wang, Yi A
f95aa42660
multi-modality initial PR
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-19 23:30:12 -07:00
Wang, Yi A
d5b78ba16f
Merge branch 'main' into gaudi_backend_pa
2025-03-19 18:15:08 -07:00
Wang, Yi A
2074d0516b
enable dbrx remove some unused code
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-19 03:16:41 -07:00
Wang, Yi A
2cde30de24
gpt_bigcode could also go pageattn
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-18 23:59:31 -07:00
Wang, Yi A
073f793976
fix phimoe issue
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-18 23:11:01 -07:00
Nicolas Patry
e497bc09f6
Minor fixes. ( #3125 )
2025-03-18 15:42:35 +01:00
Nicolas Patry
67ce543e04
Intel docker. ( #3121 )
...
* Intel docker.
* torchaudio ?
* Fixing dockerfile ?
2025-03-18 15:12:11 +01:00
Nicolas Patry
83fe45c15e
Prepare for patch release. ( #3124 )
2025-03-18 15:11:55 +01:00
Nicolas Patry
11f2eec10e
Publish nix docker image. ( #3122 )
...
* Publish nix docker image.
* Run during PR.
* Something else.
* Forgot to push.
* Build zstd.
* Pushing with skopeo
* Testing the PR.
* Runnign from nix.
* Cleaner tags.
2025-03-18 12:58:21 +01:00
Mohit Sharma
a35fbdb925
Bug Fix: Sliding Window Attention ( #3112 )
...
* (fix) sliding window attention
* (fix) flashinfer
* (typo) collection link
* Add window_size_left param ipex rocm
* Update window size rocm flash decoding
* fix: bump snapshots and improve exceed window test case
* feat: add tests for image types and remove alpha from png
* Upgrading `from_env` to get token from file when necessary + fix
pali_gemma.
* fix: add pillow dependency and bump lock+requirements
* fix: bump org name in gemma3 test
* Fix qwen2.
---------
Co-authored-by: drbh <david.richard.holtz@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2025-03-18 10:37:33 +01:00
Baptiste Colle
8c2c348f3c
Gaudi: Sync TGI with the latest changes from the TGI-Gaudi fork ( #3117 )
...
feat(gaudi): add all the changes from tgi-gaudi fork up to PR #289
2025-03-18 09:45:52 +01:00
Wang, Yi A
5cd1c93cad
add moe support, fix qwen/mistral/mixtral crash
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-18 00:45:15 -07:00
Daniël de Kok
095775e05c
launcher: correctly get the head dimension for VLMs ( #3116 )
...
* launcher: correctly get the head dimension for VLMs
For most (?) VLMs, the head dimension is in the `text_config`
configuration section. However, since we only queried the top-level
`head_dim` (which typically doesn't exist in VLMs), we would never use
flashinfer. This change adds a method that gets the head dimension from
the top-level `Config` struct or `text_config` when that fails.
* fix: bump org name in gemma3 test
---------
Co-authored-by: drbh <david.richard.holtz@gmail.com>
2025-03-17 18:19:37 +01:00
Wang, Yi
0b3e3db043
xpu 2.6 update ( #3051 )
...
* xpu 2.6 update
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* install whl
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* update get xpu memory api
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* int
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* fix awq crash if modules_to_not_convert is None
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-17 13:48:48 +01:00
Wang, Yi A
6bbe24d974
use tensor cache in hpu graph to avoid replay issue
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-17 01:36:49 -07:00
Wang, Yi A
a07e7437b6
enable all the model. not testet yet
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-17 01:26:32 -07:00
Wang, Yi A
5d3653943c
adjust block table in hpu to improve performance
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-16 20:28:01 -07:00
Wang, Yi A
b7fea6fc2f
fix TP in pageattn
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-14 18:01:58 -07:00
Wang, Yi A
201dc6294f
clean cuda/rocm code in hpu backend, enable flat_hpu
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-03-14 01:25:31 -07:00
Daniël de Kok
f91434e99b
Make the Nix-based Docker container work on non-NixOS ( #3109 )
...
On NixOS, the CUDA driver shim gets mounted on /run/opengl-driver,
where Nix packages expect the shim to be. However, on other
distributions, some FHS paths are mounted. This is a small change
to make the dynamic loader find the shim.
2025-03-13 14:02:45 +01:00
Nicolas Patry
8b91f92978
Fixing the docker build. ( #3108 )
...
* Fixing the docker build.
* Apply suggestions from code review
2025-03-13 11:26:44 +01:00
Baptiste Colle
27ed848676
Release of Gaudi Backend for TGI ( #3091 )
...
* feat(gaudi): release ready (docs, docker image and vlm ready)
* fix(gaudi): add default argument for the dockerfile
* fix(gaudi): remove use of latest for gaudi docker image + redid gaudi benchmarking section to include best practices
2025-03-13 10:56:01 +01:00
Nicolas Patry
83ef364177
We need gcc during runtime to enable triton to compile kernels. ( #3103 )
...
* We need gcc during runtime to enable triton to compile kernels.
* Fixing the docker build.
2025-03-13 10:45:47 +01:00
Daniël de Kok
83b7b7bb92
Router: add gemma3-text
model type ( #3107 )
2025-03-13 10:41:33 +01:00
Daniël de Kok
c73ae0bd88
Update to kernels
0.2.1 ( #3084 )
...
* Update to `kernels` 0.2.1
The package was renamed from `hf-kernels` to `kernels`. The new version
also updates the lockfile format.
* Download kernels in `install-cuda` target
2025-03-13 10:36:29 +01:00
Nicolas Patry
d4c6faa67b
Try to fix on main CI color. ( #3101 )
2025-03-12 10:12:24 +01:00