text-generation-inference/docs/source/basic_tutorials
Nicolas Patry 29a0893b67
Tmp tp transformers (#2942)
* Upgrade the version number.

* Remove modifications in Lock.

* Tmp branch to test transformers backend with 2.5.1 and TP>1

* Fixing the transformers backend.

inference_mode forces the use of `aten.matmul` instead of `aten.mm` the
former doesn't have sharding support crashing the transformers TP
support.

`lm_head.forward` also crashes because it skips the hook that
cast/decast the DTensor.

Torch 2.5.1 is required for sharding support.

* Put back the attention impl.

* Revert the flashinfer (this will fails).

* Building AOT.

* Using 2.5 kernels.

* Remove the archlist, it's defined in the docker anyway.
2025-01-23 18:07:30 +01:00
..
consuming_tgi.md FIxing the CI. 2024-08-16 14:21:29 +02:00
gated_model_access.md Tmp tp transformers (#2942) 2025-01-23 18:07:30 +01:00
monitoring.md docs: Fix grafana dashboard url (#1925) 2024-05-21 13:12:14 -04:00
non_core_models.md chore: add pre-commit (#1569) 2024-02-16 11:58:58 +01:00
preparing_model.md Update Quantization docs and minor doc fix. (#2368) 2024-08-08 16:06:57 -04:00
safety.md Pickle conversion now requires --trust-remote-code. (#1704) 2024-04-05 13:32:53 +02:00
train_medusa.md fix small typo and broken link (#1958) 2024-05-27 11:31:06 -04:00
using_cli.md chore: add pre-commit (#1569) 2024-02-16 11:58:58 +01:00
using_guidance.md Update using_guidance.md (#2901) 2025-01-13 11:09:35 +01:00
visual_language_models.md Update Quantization docs and minor doc fix. (#2368) 2024-08-08 16:06:57 -04:00