huggingface/text-generation-inference

Fork 0

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-07-15 20:30:16 +00:00

Wang, Yi 9883f3b40e

update doc with intel cpu part (#2420 )

* update doc with intel cpu part

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Apply suggestions from code review

we do not use latest ever in documentation, it causes too many issues for users. Release number get update on every release.

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

2024-08-29 17:42:02 +02:00

1.7 KiB

Raw Blame History

Using TGI with Intel GPUs

TGI optimized models are supported on Intel Data Center GPU Max1100, Max1550, the recommended usage is through Docker.

On a server powered by Intel GPUs, TGI can be launched with the following command:

model=teknium/OpenHermes-2.5-Mistral-7B
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --rm --privileged --cap-add=sys_nice \
    --device=/dev/dri \
    --ipc=host --shm-size 1g --net host -v $volume:/data \
    ghcr.io/huggingface/text-generation-inference:2.2.0-intel-xpu \
    --model-id $model --cuda-graphs 0

Using TGI with Intel CPUs

Intel® Extension for PyTorch (IPEX) also provides further optimizations for Intel CPUs. The IPEX provides optimization operations such as flash attention, page attention, Add + LayerNorm, ROPE and more.

On a server powered by Intel CPU, TGI can be launched with the following command:

model=teknium/OpenHermes-2.5-Mistral-7B
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --rm --privileged --cap-add=sys_nice \
    --device=/dev/dri \
    --ipc=host --shm-size 1g --net host -v $volume:/data \
    ghcr.io/huggingface/text-generation-inference:2.2.0-intel-cpu \
    --model-id $model --cuda-graphs 0

The launched TGI server can then be queried from clients, make sure to check out the Consuming TGI guide.

1.7 KiB Raw Blame History

Using TGI with Intel GPUs

Using TGI with Intel CPUs

1.7 KiB

Raw Blame History