mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-21 23:12:07 +00:00
* feat: refactor model, improve startup and re enable tests * fix: improve multimodal rotary embed caching * fix: limit vision flop calc to qwen2 vl models and update config typing * fix: include clippy lint * feat: refactor position ids in warmup and bump tests * fix: prefer default dtype * fix: enable all cuda graphs and bump snapshots * fix: adjust rotaty init path * fix: simplify get position ids and remove usused vision config * fix: update position ids so first dim is batch, simplify rotary and bump vlm default token limit * fix: improve position id init during cuda warmup for mrope and simplfy rotary forward * fix: check existance before accessing rope type in cuda warmup * fix: check key before access * fix: improve mrope check in cuda graph warmup * fix: remove check for default rope type * fix: add more test and improve model generation * fix: improve and simplify get_cos_sin, refactors and cleanup get_position_ids * fix: adjust signatures with types |
||
---|---|---|
.. | ||
attention | ||
awq | ||
compressed_tensors | ||
gptq | ||
marlin | ||
moe | ||
__init__.py | ||
bnb.py | ||
conv.py | ||
eetq.py | ||
exl2.py | ||
fp8.py | ||
layernorm.py | ||
linear.py | ||
lora.py | ||
medusa.py | ||
mlp.py | ||
rotary.py | ||
speculative.py | ||
tensor_parallel.py |