text-generation-inference/integration-tests/models/__snapshots__/test_flash_qwen2_vl
drbh 01dacf8e8f
fix cuda graphs for qwen2-vl (#2708)
* feat: support multidimensional position ids on batch to enable cuda graphs on qwen2-vl

* fix: only check model type if config exists

* fix: adjust sharding and lm head logic

* fix qwen2 failure in intel cpu

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

* fix: return correct shape logits and add streaming test

* fix: remove unused import and refactor test

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2024-11-01 03:05:34 +01:00
..
test_flash_qwen2_vl_simple_streaming.json fix cuda graphs for qwen2-vl (#2708) 2024-11-01 03:05:34 +01:00
test_flash_qwen2_vl_simple.json Support qwen2 vl (#2689) 2024-10-30 12:40:51 -04:00