Commit Graph

3 Commits

Author SHA1 Message Date
Mohit Sharma
329f612e55
Chunked Prefill VLM (#3188)
* add logic

* working

* add encoder cache free

* fixes

* fix idefics

* update pixel_values

* add improvements

* add improvements

* improve

* nit

* fix inputs_embeds

* nit

* optimizations

* add prometheus port

* rename vars

* rename vars

* nit

* disable chunking for qwen

* review comments

* remove port

* improve headdim

* remove kwargs and redundant args

* fix qwen2_5

* fix config image_token_id error

* fix test

* update paligemma

* fix paligemma text

* minor fix

* fix qwen test

* fix qwen test
2025-05-06 18:01:59 +02:00
Wang, Yi
459fbdebe3
transformers flash llm/vlm enabling in ipex (#3152)
* transformers flash llm/vlm enabling in xpu

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* ipex cpu could also support in function

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-04-15 11:08:01 +02:00
Mohit Sharma
d9bb9bebc9
Add llama4 (#3145)
* initial changes

* Add support for other vlm

* cleanup comment

* Improve attn_implementation

* Add comments for support of models

* add model

* add model

* fixes and improvements

* update docker

* Add cache position

* Add tests

* remove redundant changes

* remove tr version

* Upgrade doc + fix linting.

* Fixing the CI.

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2025-04-06 10:20:22 +02:00