text-generation-inference/Makefile-flashinfer at 29a0893b67a333aec6cc03976d0878984d0a8241 - text-generation-inference - Leaflow Developers

huggingface/text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-09 06:55:24 +00:00

Nicolas Patry 29a0893b67

Tmp tp transformers (#2942 )

* Upgrade the version number.

* Remove modifications in Lock.

* Tmp branch to test transformers backend with 2.5.1 and TP>1

* Fixing the transformers backend.

inference_mode forces the use of `aten.matmul` instead of `aten.mm` the
former doesn't have sharding support crashing the transformers TP
support.

`lm_head.forward` also crashes because it skips the hook that
cast/decast the DTensor.

Torch 2.5.1 is required for sharding support.

* Put back the attention impl.

* Revert the flashinfer (this will fails).

* Building AOT.

* Using 2.5 kernels.

* Remove the archlist, it's defined in the docker anyway.

2025-01-23 18:07:30 +01:00

7 lines

324 B

Plaintext

Raw Blame History

 install-flashinfer:
 	# We need fsspec as an additional dependency, but
 	# `pip install flashinfer` cannot resolve it.
 	pip install fsspec sympy==1.13.1 numpy
 	pip install -U setuptools
 	FLASHINFER_ENABLE_AOT=1 pip install git+https://github.com/flashinfer-ai/flashinfer.git@v0.2.0.post1#egg=flashinfer  --no-build-isolation