text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-08 22:45:23 +00:00

History

Dmitry Rogozhkin 58848cb471 feat: enable pytorch xpu support for non-attention models (#2561 ) XPU backend is available natively (without IPEX) in pytorch starting from pytorch 2.4. This commit extends TGI to cover the case when user has XPU support thru pytorch 2.4, but does not have IPEX installed. Models which don't require attention can work. For attention required models more work is needed to provide attention implementation. Tested with the following models: * teknium/OpenHermes-2.5-Mistral-7B * bigscience/bloom-560m * google/gemma-7b * google/flan-t5-xxl Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>		2024-10-14 18:28:49 +02:00
..
adapters	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
layers	Fixing intel Supports windowing. (#2637 )	2024-10-11 21:47:03 +02:00
models	feat: enable pytorch xpu support for non-attention models (#2561 )	2024-10-14 18:28:49 +02:00
pb	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
utils	feat: enable pytorch xpu support for non-attention models (#2561 )	2024-10-14 18:28:49 +02:00
__init__.py	feat(clients): Python client (#103 )	2023-03-07 18:52:22 +01:00
cache.py	fix(server): decrease memory fragmentation (#557 )	2023-07-06 14:28:33 +02:00
cli.py	Add basic FP8 KV cache support (#2603 )	2024-10-04 17:51:48 +02:00
interceptor.py	v2.0.0 (#1736 )	2024-04-12 18:38:34 +02:00
server.py	Add basic FP8 KV cache support (#2603 )	2024-10-04 17:51:48 +02:00
tracing.py	Add OTLP Service Name Environment Variable (#2076 )	2024-06-25 09:33:01 +02:00