text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-10 23:45:23 +00:00

History

Nicolas Patry 29a0893b67 Tmp tp transformers (#2942 ) * Upgrade the version number. * Remove modifications in Lock. * Tmp branch to test transformers backend with 2.5.1 and TP>1 * Fixing the transformers backend. inference_mode forces the use of `aten.matmul` instead of `aten.mm` the former doesn't have sharding support crashing the transformers TP support. `lm_head.forward` also crashes because it skips the hook that cast/decast the DTensor. Torch 2.5.1 is required for sharding support. * Put back the attention impl. * Revert the flashinfer (this will fails). * Building AOT. * Using 2.5 kernels. * Remove the archlist, it's defined in the docker anyway.		2025-01-23 18:07:30 +01:00
..
backends	fix: lint backend and doc files (#2850 )	2024-12-16 16:12:34 -05:00
basic_tutorials	Tmp tp transformers (#2942 )	2025-01-23 18:07:30 +01:00
conceptual	Tmp tp transformers (#2942 )	2025-01-23 18:07:30 +01:00
reference	Tmp tp transformers (#2942 )	2025-01-23 18:07:30 +01:00
_toctree.yml	Fix typo in TPU docs (#2911 )	2025-01-15 18:32:07 +01:00
architecture.md	TensorRT-LLM backend bump to latest version + misc fixes (#2791 )	2024-12-13 15:50:59 +01:00
index.md	Removing ../ that broke the link (#2789 )	2024-12-02 05:48:55 +01:00
installation_amd.md	Tmp tp transformers (#2942 )	2025-01-23 18:07:30 +01:00
installation_gaudi.md	MI300 compatibility (#1764 )	2024-05-17 15:30:47 +02:00
installation_inferentia.md	MI300 compatibility (#1764 )	2024-05-17 15:30:47 +02:00
installation_intel.md	Tmp tp transformers (#2942 )	2025-01-23 18:07:30 +01:00
installation_nvidia.md	Tmp tp transformers (#2942 )	2025-01-23 18:07:30 +01:00
installation_tpu.md	Fix typo in TPU docs (#2911 )	2025-01-15 18:32:07 +01:00
installation.md	MI300 compatibility (#1764 )	2024-05-17 15:30:47 +02:00
multi_backend_support.md	fix: lint backend and doc files (#2850 )	2024-12-16 16:12:34 -05:00
quicktour.md	Tmp tp transformers (#2942 )	2025-01-23 18:07:30 +01:00
supported_models.md	Improve vlm support (add idefics3 support) (#2437 )	2025-01-09 10:35:32 -05:00
usage_statistics.md	feat: allow any supported payload on /invocations (#2683 )	2024-10-23 11:26:01 +00:00