From eaaea91e2bfdcfbc9d6a2c13d623253181a83143 Mon Sep 17 00:00:00 2001 From: fxmarty <9808326+fxmarty@users.noreply.github.com> Date: Mon, 8 Jul 2024 17:52:10 +0200 Subject: [PATCH] Fix nccl regression on PyTorch 2.3 upgrade (#2099) * fix nccl issue * add note in dockerfile * use v2.22.3 that also fixes @samsamoa's repro * poetry actually can't handle the conflict between torch and nccl * set LD_PRELOAD --- server/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/server/Makefile b/server/Makefile index 0099c56a..d701c819 100644 --- a/server/Makefile +++ b/server/Makefile @@ -35,5 +35,5 @@ run-dev: SAFETENSORS_FAST_GPU=1 python -m torch.distributed.run --nproc_per_node=2 text_generation_server/cli.py serve bigscience/bloom-560m --sharded export-requirements: - poetry export -o requirements_cuda.txt --without-hashes + poetry export -o requirements_cuda.txt --without-hashes --with cuda poetry export -o requirements_rocm.txt --without-hashes