text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-07-12 02:40:16 +00:00

Author	SHA1	Message	Date
drbh	f0c38412d1	fix: add libavdevice dep to tests workflow	2024-12-23 13:47:18 -05:00
drbh	4a76e8b8b4	fix: add libavfilter dep to test	2024-12-23 13:47:18 -05:00
drbh	d5cc6707e0	fix: ensure pip is installed after installing deps in test workflow	2024-12-23 13:47:18 -05:00
drbh	daf83a95c5	fix: adjust pkg config in test	2024-12-23 13:47:18 -05:00
drbh	137f3bb2ef	fix: adjust dependencies and bump pip along with python	2024-12-23 13:47:18 -05:00
drbh	ac7483cffb	fix: debug ffmpeg deps in tests II	2024-12-23 13:47:18 -05:00
drbh	4a3a72438e	fix: debug ffmpeg install in tests workflow	2024-12-23 13:47:18 -05:00
drbh	b508b10d5c	fix: add ffmpeg deps to test build	2024-12-23 13:47:18 -05:00
drbh	39fac7ecd4	fix: include more deps for ffmpeg as docs suggest	2024-12-23 13:47:18 -05:00
drbh	16007b68bd	feat: adjust impure shell deps and autodocs workflow	2024-12-23 13:47:18 -05:00
drbh	1afaa69d1d	fix: adjust deps after rebase	2024-12-23 13:47:18 -05:00
drbh	bc5e202d2c	fix: adjust video process, reduce to 1 fps and adjust tensor shape	2024-12-23 13:47:18 -05:00
Miquel Farre	36e095b38d	flatten frames to data block when needed	2024-12-23 13:47:18 -05:00
Miquel Farre	e65ead12bb	moving video sampling and resize to validation. downstream we receive frames	2024-12-23 13:47:18 -05:00
David Holtz	322165d767	fix: remove unused deps and imports	2024-12-23 13:47:18 -05:00
David Holtz	83a7f185e8	fix: add protobuf update and mp4parse dep	2024-12-23 13:47:18 -05:00
David Holtz	b2c557594f	feat: support video input chunks and enable qwen2 vl to process video	2024-12-23 13:47:18 -05:00
Miquel Farre	3c07391e8e	fix	2024-12-23 13:47:18 -05:00
Miquel Farre	a25c3ecefc	refactoring	2024-12-23 13:47:18 -05:00
Miquel Farre	464609fd43	fix	2024-12-23 13:47:18 -05:00
Miquel Farre	b9c8152ac6	downloading videos	2024-12-23 13:47:18 -05:00
Miquel Farre	c7c2fdae8c	fix	2024-12-23 13:47:18 -05:00
Miquel Farre	05464d26bf	connecting video to qwen2	2024-12-23 13:47:18 -05:00
Miquel Farre	5ced960f6e	adopting video url	2024-12-23 13:47:18 -05:00
Miquel Farre	7c679399d5	router changes	2024-12-23 13:47:18 -05:00
Miquel Farre	18c9f06ded	WIP video support	2024-12-23 13:47:18 -05:00
drbh	23bc38b10d	fix: include add_special_tokens in kserve request (#2859 ) merging as this patch is already used, and fully limit to the kserve feature	2024-12-19 16:55:17 -05:00
Wang, Yi	ab5f616920	change xpu lib download link (#2852 ) Signed-off-by: Wang,Yi A <yi.a.wang@intel.com>	2024-12-19 12:18:58 +01:00
Mohit Sharma	8f66d323d0	Update vllm kernels for ROCM (#2826 ) * (vllm) updated vllm rocm kernels * revert silu * update partition size * remove grouped_topk * (nit) remove log * update moe-kernels commit	2024-12-18 12:44:42 +01:00
janne-alatalo	7eeefa3b57	Qwen2-VL runtime error fix when prompted with multiple images (#2840 ) * Fix runtime error when Qwen2-VL was prompted with multiple images Fix runtime error when Qwen2-VL model is prompted with prompt with more than one image. The runtime error was: File "text-generation-inference/server/text_generation_server/models/custom_modeling/qwen2_vl.py", line 459, in get_position_ids text_pos_ids = torch.arange(text_length, device=d) RuntimeError: upper bound and larger bound inconsistent with step sign The error was caused by text_length variable going to negative value when multiple images caused multiple loops in the get_position_ids function's main loop. The error is a simple logic mistake where next_image_pos is initialized as relative offset from current_pos, but was used like it was absolute position from zero. * Fix runtime error when Qwen2-VL was prompted with multiple images Fix runtime error when Qwen2-VL model is prompted with prompt with more than one image. The runtime error was: File "text-generation-inference/server/text_generation_server/models/custom_modeling/qwen2_vl.py", line 534, in forward inputs_embeds[input_ids == self.image_token_id] = image_embeds RuntimeError: shape mismatch: value tensor of shape [512, 3584] cannot be broadcast to indexing result of shape [1024, 3584] (The error message shape numbers can be different depending on the input image resolutions) The error was caused by adding the wrong number of <\|image_pad\|> tokens to the tokenized input in the image_text_replacement function. The error is a simple logical mistake where the number of image pad tokens is checked from pixel_value_shape tensor's first dimension length. However, the pixel_value_shape contains patches from all of the images. Therefore the code added the total number of required image pad tokens for the whole input to each of the images locations. This resulted to extra image pad tokens to be present in the tokenized input. The fix was to check the number of required tokens from the image_grid_thw tensor. The tensor includes grid_t, grid_h, and grid_w values for each image. grid_t * grid_h * grid_w results to the total number of patches for the image [1]. The number of required image pad tokens is number_of_patches // 4. [1] `31f9a289a6/src/transformers/models/qwen2_vl/image_processing_qwen2_vl.py (L311)` --------- Co-authored-by: Janne Alatalo <janne.alatalo@jamk.fi>	2024-12-16 22:55:11 -05:00
drbh	a72f339c79	fix: lint backend and doc files (#2850 )	2024-12-16 16:12:34 -05:00
Nicolas Patry	11ab329883	Fixing CI. (#2846 )	2024-12-16 10:58:15 +01:00
Nicolas Patry	6f0b8c947d	New arg. (#2845 )	2024-12-16 10:34:50 +01:00
Hugo Larcher	1708865fdc	Feat/trtllm cancellation dev container (#2795 ) Add devcontainers for TRTLLM backend. --------- Co-authored-by: Morgan Funtowicz <morgan@huggingface.co>	2024-12-13 16:19:06 +01:00
Funtowicz Morgan	ea7f4082c4	TensorRT-LLM backend bump to latest version + misc fixes (#2791 ) * misc(cmake) update dependencies * feat(hardware) enable new hardware.hpp and unittests * test(ctest) enable address sanitizer * feat(backend): initial rewrite of the backend for simplicity * feat(backend): remove all the logs from hardware.hpp * feat(backend): added some logging * feat(backend): enable compiler warning if support for RVO not applying * feat(backend): missing return statement * feat(backend): introduce backend_workspace_t to store precomputed information from the engine folder * feat(backend): delete previous backend impl * feat(backend): more impl * feat(backend): use latest trtllm main version to have g++ >= 13 compatibility * feat(backend): allow overriding which Python to use * feat(backend): fix backend_exception_t -> backend_error_t naming * feat(backend): impl missing generation_step_t as return value of pull_tokens * feat(backend): make backend_workspace_t::engines_folder constexpr * feat(backend): fix main.rs retrieving the tokenizer * feat(backend): add guard to multiple header definitions * test(backend): add more unittest * feat(backend): remove constexpr from par * feat(backend): remove constexpig * test(backend): more test coverage * chore(trtllm): update dependency towards 0.15.0 * effectively cancel the request on the executor * feat(backend) fix moving backend when pulling * feat(backend): make sure we can easily cancel request on the executor * feat(backend): fix missing "0" field access * misc(backend): fix reborrowing Pin<&mut T> as described in the doc https://doc.rust-lang.org/stable/std/pin/struct.Pin.html#method.as_mut * chore: Add doc and CI for TRTLLM (#2799) * chore: Add doc and CI for TRTLLM * chore: Add doc and CI for TRTLLM * chore: Add doc and CI for TRTLLM * chore: Add doc and CI for TRTLLM * doc: Formatting * misc(backend): indent --------- Co-authored-by: Hugo Larcher <hugo.larcher@huggingface.co>	2024-12-13 15:50:59 +01:00
Nicolas Patry	3bb3fd19ae	Fixup opt to reduce the amount of odd if statements. (#2833 ) * Fixup opt to reduce the amount of odd if statements. * Fixing cargo lock	2024-12-12 18:20:13 +01:00
Wang, Yi	bf59118a93	fix facebook/opt-125m not working issue (#2824 ) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2024-12-12 14:41:30 +01:00
Nicolas Patry	c3bd7212c2	Fixing latest flavor by disabling it. (#2831 )	2024-12-12 14:09:35 +01:00
Guspan Tanadi	f01f2fb6e7	docs(README): supported hardware links TGI AMD GPUs (#2814 )	2024-12-12 13:49:33 +01:00
Nicolas Patry	07b01293c5	Prepare patch release. (#2829 )	2024-12-11 21:03:50 +01:00
RodriMora	cc66dccbe8	Update README.md (#2827 ) Added instructions to clone the repo and change directory into it. In following steps there is a "make install" step that would fail if people have not cloned the repo and cd into it, so it may be confusing for some Added python venv alternative to conda too.	2024-12-11 19:45:49 +01:00
Nicolas Patry	82c24f7420	Using both value from config as they might not be correct. (#2817 ) * Using both value from config as they might not be correct. * Fixing max_position_embeddings for falcon. * Simple attempt to fix the healthcheck block allocation. * Much simpler solution. * Default value for Backend start_health	2024-12-10 19:37:09 +01:00
Nicolas Patry	a2d878fa0f	Small update to docs (#2816 )	2024-12-10 10:46:26 +01:00
Nicolas Patry	b2fac5d947	Hotfix link2 (#2812 ) 2nd hotfix ?	2024-12-09 20:57:18 +01:00
Nicolas Patry	a70dd2998b	Hotfixing the link. (#2811 )	2024-12-09 20:50:07 +01:00
Nicolas Patry	042791fbd5	Prep new version (#2810 ) * New version. * Link fixup. * Update docs. * FIxup.	2024-12-09 20:42:42 +01:00
Nicolas Patry	27fa83ca5b	V3 doc (#2809 ) * V3 document. * Updating asset.	2024-12-09 19:58:07 +01:00
Nicolas Patry	a04356fb8c	Attempt for cleverer auto batch_prefill values (some simplifications). (#2808 ) * Attempt for cleverer auto batch_prefill values (some simplifications). * Less flaky tests. * Fixing typo insertion. * Update launcher/src/main.rs Co-authored-by: Daniël de Kok <me@danieldk.eu> * Adding small comment for source of calculation. * Adding L40. * Adding L40s. --------- Co-authored-by: Daniël de Kok <me@danieldk.eu>	2024-12-09 19:44:32 +01:00
drbh	9f5c9a5e22	Enable paligemma2 (#2807 ) * feat: support loading gemma2 as vlm text model * feat: add test for paligemma2	2024-12-06 14:41:49 -05:00
Nicolas Patry	08f6fa0b59	Removing experimental to prefill chunking.	2024-12-06 19:09:40 +01:00

1 2 3 4 5 ...

1218 Commits