text-generation-inference/server/Makefile-flashinfer
Nicolas Patry 29a0893b67
Tmp tp transformers (#2942)
* Upgrade the version number.

* Remove modifications in Lock.

* Tmp branch to test transformers backend with 2.5.1 and TP>1

* Fixing the transformers backend.

inference_mode forces the use of `aten.matmul` instead of `aten.mm` the
former doesn't have sharding support crashing the transformers TP
support.

`lm_head.forward` also crashes because it skips the hook that
cast/decast the DTensor.

Torch 2.5.1 is required for sharding support.

* Put back the attention impl.

* Revert the flashinfer (this will fails).

* Building AOT.

* Using 2.5 kernels.

* Remove the archlist, it's defined in the docker anyway.
2025-01-23 18:07:30 +01:00

7 lines
324 B
Plaintext

install-flashinfer:
# We need fsspec as an additional dependency, but
# `pip install flashinfer` cannot resolve it.
pip install fsspec sympy==1.13.1 numpy
pip install -U setuptools
FLASHINFER_ENABLE_AOT=1 pip install git+https://github.com/flashinfer-ai/flashinfer.git@v0.2.0.post1#egg=flashinfer --no-build-isolation