text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-06-05 17:02:08 +00:00

History

drbh c1cf36c0dc Improve qwen vl impl (#2943 ) * feat: refactor model, improve startup and re enable tests * fix: improve multimodal rotary embed caching * fix: limit vision flop calc to qwen2 vl models and update config typing * fix: include clippy lint * feat: refactor position ids in warmup and bump tests * fix: prefer default dtype * fix: enable all cuda graphs and bump snapshots * fix: adjust rotaty init path * fix: simplify get position ids and remove usused vision config * fix: update position ids so first dim is batch, simplify rotary and bump vlm default token limit * fix: improve position id init during cuda warmup for mrope and simplfy rotary forward * fix: check existance before accessing rope type in cuda warmup * fix: check key before access * fix: improve mrope check in cuda graph warmup * fix: remove check for default rope type * fix: add more test and improve model generation * fix: improve and simplify get_cos_sin, refactors and cleanup get_position_ids * fix: adjust signatures with types		2025-02-04 12:44:18 -05:00
..
src	Improve qwen vl impl (#2943 )	2025-02-04 12:44:18 -05:00
build.rs	chore(github): add templates (#264 )	2023-05-02 15:43:19 +02:00
Cargo.toml	CI (2592): Allow LoRA adapter revision in server launcher (#2602 )	2024-10-02 10:51:04 -04:00