text-generation-inference/backends/gaudi/server/text_generation_server/layers/attention
Wang, Yi 839477670a
[gaudi] Perf optimization (#3256)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-06-11 15:00:21 +02:00
..
__init__.py [gaudi] Perf optimization (#3256) 2025-06-11 15:00:21 +02:00
common.py Move input_ids to hpu and remove disposal of adapter_meta (#3237) 2025-05-22 09:21:31 +02:00
hpu.py [gaudi] Perf optimization (#3256) 2025-06-11 15:00:21 +02:00
kv_cache.py Upgrade to new vllm extension ops for Gaudi backend (fix issue in exponential bucketing) (#3239) 2025-05-22 15:29:16 +02:00