text-generation-inference/server/text_generation_server
Antti Kervinen 8863f3728c Fix CPU and memory affinity under external resource management
- Fixes CPU affinity when running inference on CPU, and when CPUs
  are externally managed using taskset, numactl, cgroups, Kubernetes
  CPU manager, NRI resource policy plugins, for instance.

- Detect external CPU management and trust the external CPU manager
  completely. It is more likely that external manager has the big picture
  of all other tasks running on the system, their QoS, hardware
  characteristics, etc.

- For instance, do not modify even memory affinity, because the external
  manager may know better which NUMA node has fastest memory, or which
  NUMA nodes have enough free memory for this inference.

Fixes: #3011

Signed-off-by: Antti Kervinen <antti.kervinen@intel.com>
2025-02-11 12:15:58 +02:00
..
adapters feat: improve star coder to support multi lora layers (#2883) 2025-01-16 16:23:55 -05:00
layers Use kernels from the kernel hub (#2988) 2025-02-10 19:19:25 +01:00
models Fix CPU and memory affinity under external resource management 2025-02-11 12:15:58 +02:00
pb chore: add pre-commit (#1569) 2024-02-16 11:58:58 +01:00
utils Use kernels from the kernel hub (#2988) 2025-02-10 19:19:25 +01:00
__init__.py feat(clients): Python client (#103) 2023-03-07 18:52:22 +01:00
cache.py fix(server): decrease memory fragmentation (#557) 2023-07-06 14:28:33 +02:00
cli.py Fixing TRTLLM dockerfile. (#2922) 2025-01-20 11:13:46 +01:00
interceptor.py feat: prefill chunking (#2600) 2024-10-16 12:49:33 +02:00
server.py Tmp tp transformers (#2942) 2025-01-23 18:07:30 +01:00
tracing.py Add OTLP Service Name Environment Variable (#2076) 2024-06-25 09:33:01 +02:00