text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-05-02 23:42:06 +00:00

History

drbh 01dacf8e8f fix cuda graphs for qwen2-vl (#2708 ) * feat: support multidimensional position ids on batch to enable cuda graphs on qwen2-vl * fix: only check model type if config exists * fix: adjust sharding and lm head logic * fix qwen2 failure in intel cpu Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * fix: return correct shape logits and add streaming test * fix: remove unused import and refactor test --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2024-11-01 03:05:34 +01:00
..
test_flash_qwen2_vl_simple_streaming.json	fix cuda graphs for qwen2-vl (#2708 )	2024-11-01 03:05:34 +01:00
test_flash_qwen2_vl_simple.json	Support qwen2 vl (#2689 )	2024-10-30 12:40:51 -04:00

* feat: support multidimensional position ids on batch to enable cuda graphs on qwen2-vl

* fix: only check model type if config exists

* fix: adjust sharding and lm head logic

* fix qwen2 failure in intel cpu

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* fix: return correct shape logits and add streaming test

* fix: remove unused import and refactor test

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

2024-11-01 03:05:34 +01:00

test_flash_qwen2_vl_simple_streaming.json

fix cuda graphs for qwen2-vl (#2708 )

2024-11-01 03:05:34 +01:00

test_flash_qwen2_vl_simple.json

Support qwen2 vl (#2689 )

2024-10-30 12:40:51 -04:00