text-generation-inference/docs/source/reference
Nicolas Patry 29a0893b67
Tmp tp transformers (#2942)
* Upgrade the version number.

* Remove modifications in Lock.

* Tmp branch to test transformers backend with 2.5.1 and TP>1

* Fixing the transformers backend.

inference_mode forces the use of `aten.matmul` instead of `aten.mm` the
former doesn't have sharding support crashing the transformers TP
support.

`lm_head.forward` also crashes because it skips the hook that
cast/decast the DTensor.

Torch 2.5.1 is required for sharding support.

* Put back the attention impl.

* Revert the flashinfer (this will fails).

* Building AOT.

* Using 2.5 kernels.

* Remove the archlist, it's defined in the docker anyway.
2025-01-23 18:07:30 +01:00
..
api_reference.md Tmp tp transformers (#2942) 2025-01-23 18:07:30 +01:00
launcher.md Auto max prefill (#2797) 2024-12-06 05:52:00 +01:00
metrics.md doc: Add metrics documentation and add a 'Reference' section (#2230) 2024-08-16 19:43:30 +02:00