text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-20 04:15:23 +00:00

History

Wang, Yi e49aed4713 use xpu-smi to dump used memory (#2047 ) * use xpu-smi to dump used memory xpu use "ZE_AFFINITY_MASK" to control card, usage is like CUDA_VISIBLE_DEVICES Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * Update server/text_generation_server/utils/import_utils.py Co-authored-by: Daniël de Kok <me@github.danieldk.eu> --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Co-authored-by: Daniël de Kok <me@github.danieldk.eu>	2024-09-24 03:51:26 +00:00
..
env_runtime.rs	Integrate flash attention for starcoder2 tgi through habana and some fixes, enabling (#198 )	2024-08-07 22:06:05 +02:00
main.rs	use xpu-smi to dump used memory (#2047 )	2024-09-24 03:51:26 +00:00

Wang, Yi e49aed4713 use xpu-smi to dump used memory (#2047 )

* use xpu-smi to dump used memory
xpu use "ZE_AFFINITY_MASK" to control card, usage is like CUDA_VISIBLE_DEVICES

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* Update server/text_generation_server/utils/import_utils.py

Co-authored-by: Daniël de Kok <me@github.danieldk.eu>

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Daniël de Kok <me@github.danieldk.eu>

2024-09-24 03:51:26 +00:00

env_runtime.rs

Integrate flash attention for starcoder2 tgi through habana and some fixes, enabling (#198 )

2024-08-07 22:06:05 +02:00

main.rs

use xpu-smi to dump used memory (#2047 )

2024-09-24 03:51:26 +00:00