text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-09 23:15:23 +00:00

History

janne-alatalo 7eeefa3b57 Qwen2-VL runtime error fix when prompted with multiple images (#2840 ) * Fix runtime error when Qwen2-VL was prompted with multiple images Fix runtime error when Qwen2-VL model is prompted with prompt with more than one image. The runtime error was: File "text-generation-inference/server/text_generation_server/models/custom_modeling/qwen2_vl.py", line 459, in get_position_ids text_pos_ids = torch.arange(text_length, device=d) RuntimeError: upper bound and larger bound inconsistent with step sign The error was caused by text_length variable going to negative value when multiple images caused multiple loops in the get_position_ids function's main loop. The error is a simple logic mistake where next_image_pos is initialized as relative offset from current_pos, but was used like it was absolute position from zero. * Fix runtime error when Qwen2-VL was prompted with multiple images Fix runtime error when Qwen2-VL model is prompted with prompt with more than one image. The runtime error was: File "text-generation-inference/server/text_generation_server/models/custom_modeling/qwen2_vl.py", line 534, in forward inputs_embeds[input_ids == self.image_token_id] = image_embeds RuntimeError: shape mismatch: value tensor of shape [512, 3584] cannot be broadcast to indexing result of shape [1024, 3584] (The error message shape numbers can be different depending on the input image resolutions) The error was caused by adding the wrong number of <\|image_pad\|> tokens to the tokenized input in the image_text_replacement function. The error is a simple logical mistake where the number of image pad tokens is checked from pixel_value_shape tensor's first dimension length. However, the pixel_value_shape contains patches from all of the images. Therefore the code added the total number of required image pad tokens for the whole input to each of the images locations. This resulted to extra image pad tokens to be present in the tokenized input. The fix was to check the number of required tokens from the image_grid_thw tensor. The tensor includes grid_t, grid_h, and grid_w values for each image. grid_t * grid_h * grid_w results to the total number of patches for the image [1]. The number of required image pad tokens is number_of_patches // 4. [1] `31f9a289a6/src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py (L311)` --------- Co-authored-by: Janne Alatalo <janne.alatalo@jamk.fi>		2024-12-16 22:55:11 -05:00
..
__init__.py	feat(server): flash santacoder (#153 )	2023-04-03 19:06:42 +02:00
bloom_modeling.py	Fixing auto bloom test. (#2699 )	2024-10-28 06:14:11 +01:00
clip.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
flash_cohere_modeling.py	Add support for FP8 KV cache scales (#2628 )	2024-10-24 16:36:18 +02:00
flash_dbrx_modeling.py	Simplify two ipex conditions (#2755 )	2024-11-19 08:04:23 +01:00
flash_deepseek_v2_modeling.py	Add support for FP8 KV cache scales (#2628 )	2024-10-24 16:36:18 +02:00
flash_gemma2_modeling.py	Add support for FP8 KV cache scales (#2628 )	2024-10-24 16:36:18 +02:00
flash_gemma_modeling.py	Add support for FP8 KV cache scales (#2628 )	2024-10-24 16:36:18 +02:00
flash_gpt2_modeling.py	Add support for FP8 KV cache scales (#2628 )	2024-10-24 16:36:18 +02:00
flash_gptj_modeling.py	Add support for FP8 KV cache scales (#2628 )	2024-10-24 16:36:18 +02:00
flash_llama_modeling.py	fix: adjust llama MLP name from dense to mlp to correctly apply lora (#2760 )	2024-11-19 15:10:22 -05:00
flash_mistral_modeling.py	Add support for FP8 KV cache scales (#2628 )	2024-10-24 16:36:18 +02:00
flash_mixtral_modeling.py	Add support for FP8 KV cache scales (#2628 )	2024-10-24 16:36:18 +02:00
flash_neox_modeling.py	Add support for FP8 KV cache scales (#2628 )	2024-10-24 16:36:18 +02:00
flash_pali_gemma_modeling.py	Support qwen2 vl (#2689 )	2024-10-30 12:40:51 -04:00
flash_phi_modeling.py	Add support for FP8 KV cache scales (#2628 )	2024-10-24 16:36:18 +02:00
flash_phi_moe_modeling.py	feat: support phi3.5 moe (#2479 )	2024-09-30 11:15:09 +02:00
flash_qwen2_modeling.py	Support qwen2 vl (#2689 )	2024-10-30 12:40:51 -04:00
flash_rw_modeling.py	Using both value from config as they might not be correct. (#2817 )	2024-12-10 19:37:09 +01:00
flash_santacoder_modeling.py	Add support for FP8 KV cache scales (#2628 )	2024-10-24 16:36:18 +02:00
flash_starcoder2_modeling.py	Add support for FP8 KV cache scales (#2628 )	2024-10-24 16:36:18 +02:00
idefics2.py	Support qwen2 vl (#2689 )	2024-10-30 12:40:51 -04:00
idefics_config.py	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
idefics_image_processing.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
idefics_modeling.py	enable HuggingFaceM4/idefics-9b in intel gpu (#2338 )	2024-08-01 11:08:36 +02:00
idefics_perceiver.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
idefics_processing.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
idefics_vision.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
llava_next.py	Support qwen2 vl (#2689 )	2024-10-30 12:40:51 -04:00
mamba_modeling.py	Fix: Change embeddings to embedding (#2738 )	2024-11-15 13:16:15 +01:00
mllama.py	feat: prefill chunking (#2600 )	2024-10-16 12:49:33 +02:00
mpt_modeling.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
neox_modeling.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
opt_modeling.py	Fixup opt to reduce the amount of odd if statements. (#2833 )	2024-12-12 18:20:13 +01:00
phi_modeling.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
qwen2_vl.py	Qwen2-VL runtime error fix when prompted with multiple images (#2840 )	2024-12-16 22:55:11 -05:00
siglip.py	Fix: don't apply post layernorm in SiglipVisionTransformer (#2459 )	2024-08-26 17:04:46 -04:00
t5_modeling.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
vlm.py	Enable paligemma2 (#2807 )	2024-12-06 14:41:49 -05:00