text-generation-inference

huggingface/text-generation-inference

Fork 0

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-05-21 09:42:09 +00:00

Commit Graph

Author	SHA1	Message	Date
Mohit Sharma	329f612e55	Chunked Prefill VLM (#3188 ) * add logic * working * add encoder cache free * fixes * fix idefics * update pixel_values * add improvements * add improvements * improve * nit * fix inputs_embeds * nit * optimizations * add prometheus port * rename vars * rename vars * nit * disable chunking for qwen * review comments * remove port * improve headdim * remove kwargs and redundant args * fix qwen2_5 * fix config image_token_id error * fix test * update paligemma * fix paligemma text * minor fix * fix qwen test * fix qwen test	2025-05-06 18:01:59 +02:00
Wang, Yi	459fbdebe3	transformers flash llm/vlm enabling in ipex (#3152 ) * transformers flash llm/vlm enabling in xpu Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * ipex cpu could also support in function Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2025-04-15 11:08:01 +02:00
Mohit Sharma	d9bb9bebc9	Add llama4 (#3145 ) * initial changes * Add support for other vlm * cleanup comment * Improve attn_implementation * Add comments for support of models * add model * add model * fixes and improvements * update docker * Add cache position * Add tests * remove redundant changes * remove tr version * Upgrade doc + fix linting. * Fixing the CI. --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2025-04-06 10:20:22 +02:00

Author

SHA1

Message

Date

Mohit Sharma

329f612e55

Chunked Prefill VLM (#3188 )

* add logic

* working

* add encoder cache free

* fixes

* fix idefics

* update pixel_values

* add improvements

* add improvements

* improve

* nit

* fix inputs_embeds

* nit

* optimizations

* add prometheus port

* rename vars

* rename vars

* nit

* disable chunking for qwen

* review comments

* remove port

* improve headdim

* remove kwargs and redundant args

* fix qwen2_5

* fix config image_token_id error

* fix test

* update paligemma

* fix paligemma text

* minor fix

* fix qwen test

* fix qwen test

2025-05-06 18:01:59 +02:00

Wang, Yi

459fbdebe3

transformers flash llm/vlm enabling in ipex (#3152 )

* transformers flash llm/vlm enabling in xpu

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* ipex cpu could also support in function

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

2025-04-15 11:08:01 +02:00

Mohit Sharma

d9bb9bebc9

Add llama4 (#3145 )

* initial changes

* Add support for other vlm

* cleanup comment

* Improve attn_implementation

* Add comments for support of models

* add model

* add model

* fixes and improvements

* update docker

* Add cache position

* Add tests

* remove redundant changes

* remove tr version

* Upgrade doc + fix linting.

* Fixing the CI.

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

2025-04-06 10:20:22 +02:00

3 Commits