text-generation-inference/server/text_generation_server/layers/attention
Wang, Yi 51a0b9d11c
IPEX support FP8 kvcache/softcap/slidingwindow (#3144)
* IPEX support FP8 kvcache

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* add kvcache dtype

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* add softcap and slidingwindow

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* kv scale in pageattn

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* remove triton installation, will be installed with torch

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* install xelink lib

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* softcap default -1.0

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* softcap default -1.0

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-05-06 10:49:24 +02:00
..
__init__.py Add support for FP8 KV cache scales (#2628) 2024-10-24 16:36:18 +02:00
common.py feat: prefill chunking (#2600) 2024-10-16 12:49:33 +02:00
cuda.py Bug Fix: Sliding Window Attention (#3112) 2025-03-18 10:37:33 +01:00
flash_attn_triton.py feat: prefill chunking (#2600) 2024-10-16 12:49:33 +02:00
flashinfer.py Support flashinfer for Gemma3 prefill (#3167) 2025-04-17 18:07:41 +02:00
ipex.py IPEX support FP8 kvcache/softcap/slidingwindow (#3144) 2025-05-06 10:49:24 +02:00
kv_cache.py IPEX support FP8 kvcache/softcap/slidingwindow (#3144) 2025-05-06 10:49:24 +02:00
rocm.py Bug Fix: Sliding Window Attention (#3112) 2025-03-18 10:37:33 +01:00