Commit Graph

1218 Commits

Author SHA1 Message Date
drbh
f0c38412d1 fix: add libavdevice dep to tests workflow 2024-12-23 13:47:18 -05:00
drbh
4a76e8b8b4 fix: add libavfilter dep to test 2024-12-23 13:47:18 -05:00
drbh
d5cc6707e0 fix: ensure pip is installed after installing deps in test workflow 2024-12-23 13:47:18 -05:00
drbh
daf83a95c5 fix: adjust pkg config in test 2024-12-23 13:47:18 -05:00
drbh
137f3bb2ef fix: adjust dependencies and bump pip along with python 2024-12-23 13:47:18 -05:00
drbh
ac7483cffb fix: debug ffmpeg deps in tests II 2024-12-23 13:47:18 -05:00
drbh
4a3a72438e fix: debug ffmpeg install in tests workflow 2024-12-23 13:47:18 -05:00
drbh
b508b10d5c fix: add ffmpeg deps to test build 2024-12-23 13:47:18 -05:00
drbh
39fac7ecd4 fix: include more deps for ffmpeg as docs suggest 2024-12-23 13:47:18 -05:00
drbh
16007b68bd feat: adjust impure shell deps and autodocs workflow 2024-12-23 13:47:18 -05:00
drbh
1afaa69d1d fix: adjust deps after rebase 2024-12-23 13:47:18 -05:00
drbh
bc5e202d2c fix: adjust video process, reduce to 1 fps and adjust tensor shape 2024-12-23 13:47:18 -05:00
Miquel Farre
36e095b38d flatten frames to data block when needed 2024-12-23 13:47:18 -05:00
Miquel Farre
e65ead12bb moving video sampling and resize to validation. downstream we receive frames 2024-12-23 13:47:18 -05:00
David Holtz
322165d767 fix: remove unused deps and imports 2024-12-23 13:47:18 -05:00
David Holtz
83a7f185e8 fix: add protobuf update and mp4parse dep 2024-12-23 13:47:18 -05:00
David Holtz
b2c557594f feat: support video input chunks and enable qwen2 vl to process video 2024-12-23 13:47:18 -05:00
Miquel Farre
3c07391e8e fix 2024-12-23 13:47:18 -05:00
Miquel Farre
a25c3ecefc refactoring 2024-12-23 13:47:18 -05:00
Miquel Farre
464609fd43 fix 2024-12-23 13:47:18 -05:00
Miquel Farre
b9c8152ac6 downloading videos 2024-12-23 13:47:18 -05:00
Miquel Farre
c7c2fdae8c fix 2024-12-23 13:47:18 -05:00
Miquel Farre
05464d26bf connecting video to qwen2 2024-12-23 13:47:18 -05:00
Miquel Farre
5ced960f6e adopting video url 2024-12-23 13:47:18 -05:00
Miquel Farre
7c679399d5 router changes 2024-12-23 13:47:18 -05:00
Miquel Farre
18c9f06ded WIP video support 2024-12-23 13:47:18 -05:00
drbh
23bc38b10d
fix: include add_special_tokens in kserve request (#2859)
merging as this patch is already used, and fully limit to the kserve feature
2024-12-19 16:55:17 -05:00
Wang, Yi
ab5f616920
change xpu lib download link (#2852)
Signed-off-by: Wang,Yi A <yi.a.wang@intel.com>
2024-12-19 12:18:58 +01:00
Mohit Sharma
8f66d323d0
Update vllm kernels for ROCM (#2826)
* (vllm) updated vllm rocm kernels

* revert silu

* update partition size

* remove grouped_topk

* (nit) remove log

* update moe-kernels commit
2024-12-18 12:44:42 +01:00
janne-alatalo
7eeefa3b57
Qwen2-VL runtime error fix when prompted with multiple images (#2840)
* Fix runtime error when Qwen2-VL was prompted with multiple images

Fix runtime error when Qwen2-VL model is prompted with prompt with more
than one image. The runtime error was:

 File "text-generation-inference/server/text_generation_server/models/custom_modeling/qwen2_vl.py", line 459, in get_position_ids
    text_pos_ids = torch.arange(text_length, device=d)
RuntimeError: upper bound and larger bound inconsistent with step sign

The error was caused by text_length variable going to negative value
when multiple images caused multiple loops in the get_position_ids
function's main loop.

The error is a simple logic mistake where next_image_pos is initialized
as relative offset from current_pos, but was used like it was absolute
position from zero.

* Fix runtime error when Qwen2-VL was prompted with multiple images

Fix runtime error when Qwen2-VL model is prompted with prompt with more
than one image. The runtime error was:

File "text-generation-inference/server/text_generation_server/models/custom_modeling/qwen2_vl.py", line 534, in forward
    inputs_embeds[input_ids == self.image_token_id] = image_embeds
RuntimeError: shape mismatch: value tensor of shape [512, 3584] cannot be broadcast to indexing result of shape [1024, 3584]

(The error message shape numbers can be different depending on the input
image resolutions)

The error was caused by adding the wrong number of <|image_pad|> tokens
to the tokenized input in the image_text_replacement function.

The error is a simple logical mistake where the number of image pad
tokens is checked from pixel_value_shape tensor's first dimension
length. However, the pixel_value_shape contains patches from all of the
images. Therefore the code added the total number of required image pad
tokens for the whole input to each of the images locations. This
resulted to extra image pad tokens to be present in the tokenized input.

The fix was to check the number of required tokens from the
image_grid_thw tensor. The tensor includes grid_t, grid_h, and grid_w
values for each image. grid_t * grid_h * grid_w results to the total
number of patches for the image [1]. The number of required image pad
tokens is number_of_patches // 4.

[1] 31f9a289a6/src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py (L311)

---------

Co-authored-by: Janne Alatalo <janne.alatalo@jamk.fi>
2024-12-16 22:55:11 -05:00
drbh
a72f339c79
fix: lint backend and doc files (#2850) 2024-12-16 16:12:34 -05:00
Nicolas Patry
11ab329883
Fixing CI. (#2846) 2024-12-16 10:58:15 +01:00
Nicolas Patry
6f0b8c947d
New arg. (#2845) 2024-12-16 10:34:50 +01:00
Hugo Larcher
1708865fdc
Feat/trtllm cancellation dev container (#2795)
Add devcontainers for TRTLLM backend.

---------

Co-authored-by: Morgan Funtowicz <morgan@huggingface.co>
2024-12-13 16:19:06 +01:00
Funtowicz Morgan
ea7f4082c4
TensorRT-LLM backend bump to latest version + misc fixes (#2791)
* misc(cmake) update dependencies

* feat(hardware) enable new hardware.hpp and unittests

* test(ctest) enable address sanitizer

* feat(backend): initial rewrite of the backend for simplicity

* feat(backend): remove all the logs from hardware.hpp

* feat(backend): added some logging

* feat(backend): enable compiler warning if support for RVO not applying

* feat(backend): missing return statement

* feat(backend): introduce backend_workspace_t to store precomputed information from the engine folder

* feat(backend): delete previous backend impl

* feat(backend): more impl

* feat(backend): use latest trtllm main version to have g++ >= 13 compatibility

* feat(backend): allow overriding which Python to use

* feat(backend): fix backend_exception_t -> backend_error_t naming

* feat(backend): impl missing generation_step_t as return value of pull_tokens

* feat(backend): make backend_workspace_t::engines_folder constexpr

* feat(backend): fix main.rs retrieving the tokenizer

* feat(backend): add guard to multiple header definitions

* test(backend): add more unittest

* feat(backend): remove constexpr from par

* feat(backend): remove constexpig

* test(backend): more test coverage

* chore(trtllm): update dependency towards 0.15.0

* effectively cancel the request on the executor

* feat(backend) fix moving backend when pulling

* feat(backend): make sure we can easily cancel request on the executor

* feat(backend): fix missing "0" field access

* misc(backend): fix reborrowing Pin<&mut T> as described in the doc https://doc.rust-lang.org/stable/std/pin/struct.Pin.html#method.as_mut

* chore: Add doc and CI for TRTLLM (#2799)

* chore: Add doc and CI for TRTLLM

* chore: Add doc and CI for TRTLLM

* chore: Add doc and CI for TRTLLM

* chore: Add doc and CI for TRTLLM

* doc: Formatting

* misc(backend): indent

---------

Co-authored-by: Hugo Larcher <hugo.larcher@huggingface.co>
2024-12-13 15:50:59 +01:00
Nicolas Patry
3bb3fd19ae
Fixup opt to reduce the amount of odd if statements. (#2833)
* Fixup opt to reduce the amount of odd if statements.

* Fixing cargo lock
2024-12-12 18:20:13 +01:00
Wang, Yi
bf59118a93
fix facebook/opt-125m not working issue (#2824)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2024-12-12 14:41:30 +01:00
Nicolas Patry
c3bd7212c2
Fixing latest flavor by disabling it. (#2831) 2024-12-12 14:09:35 +01:00
Guspan Tanadi
f01f2fb6e7
docs(README): supported hardware links TGI AMD GPUs (#2814) 2024-12-12 13:49:33 +01:00
Nicolas Patry
07b01293c5
Prepare patch release. (#2829) 2024-12-11 21:03:50 +01:00
RodriMora
cc66dccbe8
Update README.md (#2827)
Added instructions to clone the repo and change directory into it. 

In following steps there is a "make install" step that would fail if people have not cloned the repo and cd into it, so it may be confusing for some

Added python venv alternative to conda too.
2024-12-11 19:45:49 +01:00
Nicolas Patry
82c24f7420
Using both value from config as they might not be correct. (#2817)
* Using both value from config as they might not be correct.

* Fixing max_position_embeddings for falcon.

* Simple attempt to fix the healthcheck block allocation.

* Much simpler solution.

* Default value for Backend start_health
2024-12-10 19:37:09 +01:00
Nicolas Patry
a2d878fa0f
Small update to docs (#2816) 2024-12-10 10:46:26 +01:00
Nicolas Patry
b2fac5d947
Hotfix link2 (#2812)
2nd hotfix ?
2024-12-09 20:57:18 +01:00
Nicolas Patry
a70dd2998b
Hotfixing the link. (#2811) 2024-12-09 20:50:07 +01:00
Nicolas Patry
042791fbd5
Prep new version (#2810)
* New version.

* Link fixup.

* Update docs.

* FIxup.
2024-12-09 20:42:42 +01:00
Nicolas Patry
27fa83ca5b
V3 doc (#2809)
* V3 document.

* Updating asset.
2024-12-09 19:58:07 +01:00
Nicolas Patry
a04356fb8c
Attempt for cleverer auto batch_prefill values (some simplifications). (#2808)
* Attempt for cleverer auto batch_prefill values (some simplifications).

* Less flaky tests.

* Fixing typo insertion.

* Update launcher/src/main.rs

Co-authored-by: Daniël de Kok <me@danieldk.eu>

* Adding small comment for source of calculation.

* Adding L40.

* Adding L40s.

---------

Co-authored-by: Daniël de Kok <me@danieldk.eu>
2024-12-09 19:44:32 +01:00
drbh
9f5c9a5e22
Enable paligemma2 (#2807)
* feat: support loading gemma2 as vlm text model

* feat: add test for paligemma2
2024-12-06 14:41:49 -05:00
Nicolas Patry
08f6fa0b59
Removing experimental to prefill chunking. 2024-12-06 19:09:40 +01:00