mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-07-05 23:40:17 +00:00

History

Antti Kervinen 8863f3728c Fix CPU and memory affinity under external resource management - Fixes CPU affinity when running inference on CPU, and when CPUs are externally managed using taskset, numactl, cgroups, Kubernetes CPU manager, NRI resource policy plugins, for instance. - Detect external CPU management and trust the external CPU manager completely. It is more likely that external manager has the big picture of all other tasks running on the system, their QoS, hardware characteristics, etc. - For instance, do not modify even memory affinity, because the external manager may know better which NUMA node has fastest memory, or which NUMA nodes have enough free memory for this inference. Fixes: #3011 Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>		2025-02-11 12:15:58 +02:00
..
custom_kernels	All integration tests back everywhere (too many failed CI). (#2428 )	2024-08-16 21:19:46 +02:00
exllama_kernels	Update ROCM libs and improvements (#2579 )	2024-09-30 10:54:32 +02:00
exllamav2_kernels	Update ROCM libs and improvements (#2579 )	2024-09-30 10:54:32 +02:00
tests	feat: improve star coder to support multi lora layers (#2883 )	2025-01-16 16:23:55 -05:00
text_generation_server	Fix CPU and memory affinity under external resource management	2025-02-11 12:15:58 +02:00
.gitignore	Impl simple mamba model (#1480 )	2024-02-08 10:19:45 +01:00
bounds-from-nix.py	Sync (most) server dependencies with Nix (#2782 )	2024-12-03 04:04:06 +01:00
hf-kernels.lock	Use kernels from the kernel hub (#2988 )	2025-02-10 19:19:25 +01:00
Makefile	Using the "lockfile". (#2992 )	2025-02-06 12:28:24 +01:00
Makefile-awq	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
Makefile-eetq	Sync (most) server dependencies with Nix (#2782 )	2024-12-03 04:04:06 +01:00
Makefile-exllamav2	Upgrading exl2. (#2415 )	2024-08-14 11:58:08 +02:00
Makefile-flash-att	Hotfixing `make install`. (#2008 )	2024-06-04 23:34:03 +02:00
Makefile-flash-att-v2	Add Flash decoding kernel ROCm (#2855 )	2025-01-13 11:12:35 +01:00
Makefile-flashinfer	Trying to put back the archlist (to fix the oom). (#2947 )	2025-01-24 09:32:17 +01:00
Makefile-lorax-punica	Enable multiple LoRa adapters (#2010 )	2024-06-25 14:46:27 -04:00
Makefile-selective-scan	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
Makefile-vllm	Update vllm kernels for ROCM (#2826 )	2024-12-18 12:44:42 +01:00
pyproject.toml	Use kernels from the kernel hub (#2988 )	2025-02-10 19:19:25 +01:00
README.md	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
req.txt	Using the "lockfile". (#2992 )	2025-02-06 12:28:24 +01:00
requirements_cuda.txt	Using the "lockfile". (#2992 )	2025-02-06 12:28:24 +01:00
requirements_gen.txt	Using the "lockfile". (#2992 )	2025-02-06 12:28:24 +01:00
requirements_intel.txt	Using the "lockfile". (#2992 )	2025-02-06 12:28:24 +01:00
requirements_rocm.txt	Using the "lockfile". (#2992 )	2025-02-06 12:28:24 +01:00
uv.lock	Use kernels from the kernel hub (#2988 )	2025-02-10 19:19:25 +01:00

README.md

Text Generation Inference Python gRPC Server

A Python gRPC server for Text Generation Inference

Install

make install

Run

make run-dev