Fix nccl regression on PyTorch 2.3 upgrade (#2099)

* fix nccl issue

* add note in dockerfile

* use v2.22.3 that also fixes @samsamoa's repro

* poetry actually can't handle the conflict between torch and nccl

* set LD_PRELOAD
This commit is contained in:
fxmarty 2024-07-08 17:52:10 +02:00 committed by yuanwu
parent 48f1196da8
commit eaaea91e2b

View File

@ -35,5 +35,5 @@ run-dev:
SAFETENSORS_FAST_GPU=1 python -m torch.distributed.run --nproc_per_node=2 text_generation_server/cli.py serve bigscience/bloom-560m --sharded
export-requirements:
poetry export -o requirements_cuda.txt --without-hashes
poetry export -o requirements_cuda.txt --without-hashes --with cuda
poetry export -o requirements_rocm.txt --without-hashes