mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-06-10 03:12:07 +00:00

History

Dmitry Rogozhkin 58848cb471 feat: enable pytorch xpu support for non-attention models (#2561 ) XPU backend is available natively (without IPEX) in pytorch starting from pytorch 2.4. This commit extends TGI to cover the case when user has XPU support thru pytorch 2.4, but does not have IPEX installed. Models which don't require attention can work. For attention required models more work is needed to provide attention implementation. Tested with the following models: * teknium/OpenHermes-2.5-Mistral-7B * bigscience/bloom-560m * google/gemma-7b * google/flan-t5-xxl Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>		2024-10-14 18:28:49 +02:00
..
custom_kernels	All integration tests back everywhere (too many failed CI). (#2428 )	2024-08-16 21:19:46 +02:00
exllama_kernels	Update ROCM libs and improvements (#2579 )	2024-09-30 10:54:32 +02:00
exllamav2_kernels	Update ROCM libs and improvements (#2579 )	2024-09-30 10:54:32 +02:00
tests	Fix tokenization yi (#2507 )	2024-09-11 22:41:56 +02:00
text_generation_server	feat: enable pytorch xpu support for non-attention models (#2561 )	2024-10-14 18:28:49 +02:00
.gitignore	Impl simple mamba model (#1480 )	2024-02-08 10:19:45 +01:00
Makefile	Lots of improvements (Still 2 allocators) (#2449 )	2024-08-29 16:29:01 +02:00
Makefile-awq	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
Makefile-eetq	Upgrade EETQ (Fixes the cuda graphs). (#1729 )	2024-04-12 08:15:28 +02:00
Makefile-exllamav2	Upgrading exl2. (#2415 )	2024-08-14 11:58:08 +02:00
Makefile-fbgemm	Add Directory Check to Prevent Redundant Cloning in Build Process (#2486 )	2024-09-07 13:19:43 +02:00
Makefile-flash-att	Hotfixing `make install`. (#2008 )	2024-06-04 23:34:03 +02:00
Makefile-flash-att-v2	Update ROCM libs and improvements (#2579 )	2024-09-30 10:54:32 +02:00
Makefile-flashinfer	Prefix test - Different kind of load test to trigger prefix test bugs. (#2490 )	2024-09-11 18:10:40 +02:00
Makefile-lorax-punica	Enable multiple LoRa adapters (#2010 )	2024-06-25 14:46:27 -04:00
Makefile-selective-scan	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
Makefile-vllm	Update ROCM libs and improvements (#2579 )	2024-09-30 10:54:32 +02:00
poetry.lock	Add support for fused MoE Marlin for AWQ (#2616 )	2024-10-08 11:56:41 +02:00
pyproject.toml	nix: add black and isort to the closure (#2619 )	2024-10-09 11:08:02 +02:00
README.md	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
requirements_cuda.txt	Mllama flash version (#2585 )	2024-10-02 11:22:13 +02:00
requirements_intel.txt	Mllama flash version (#2585 )	2024-10-02 11:22:13 +02:00
requirements_rocm.txt	Mllama flash version (#2585 )	2024-10-02 11:22:13 +02:00

README.md

Text Generation Inference Python gRPC Server

A Python gRPC server for Text Generation Inference

Install

make install

Run

make run-dev