Commit Graph

13 Commits

Author SHA1 Message Date
Hugo Larcher
73b7cf83f6
Add backend name to telemetry (#2962)
* feat: Add backend name to telemetry
2025-01-28 16:53:16 +01:00
drbh
8f6146f11a
Revert "feat: improve qwen2-vl startup " (#2924)
Revert "feat: improve qwen2-vl startup  (#2802)"

This reverts commit eecca27113.
2025-01-17 12:09:05 -05:00
drbh
eecca27113
feat: improve qwen2-vl startup (#2802)
* feat: tokenize each request individually and increase warmup image size

* feat: adjust rotary embed and avoid cuda graphs of size 2 and smaller

* fix: address image resize and rebase changes

* feat: update to run qwen2-vl tests

* fix: tweak param types
2025-01-17 11:50:41 -05:00
Nicolas Patry
203cade244
Upgrading our rustc version. (#2908)
* Upgrading our rustc version.

* Fixing the rust tests to proper version.

* Clippy everything.
2025-01-15 17:04:03 +01:00
Dmitry Dygalo
01067f8ba8
chore: Update jsonschema to 0.28.0 (#2870)
* chore: Update jsonschema to 0.28.0

Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>

* chore: Enable blocking feature for reqwest

Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>

---------

Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
2025-01-10 15:01:54 +01:00
Nicolas Patry
82c24f7420
Using both value from config as they might not be correct. (#2817)
* Using both value from config as they might not be correct.

* Fixing max_position_embeddings for falcon.

* Simple attempt to fix the healthcheck block allocation.

* Much simpler solution.

* Default value for Backend start_health
2024-12-10 19:37:09 +01:00
OlivierDehaene
8c3669b287
feat: auto max_new_tokens (#2803)
* feat: auto max_new_tokens

* update default

* Fixing the tests.

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2024-12-06 05:50:35 +01:00
OlivierDehaene
ab7ccf5bc3
feat: add payload limit (#2726)
* feat: add payload limit

* update launcher
2024-11-21 18:20:15 +00:00
Nicolas Patry
ed87b464b4
Fixing "deadlock" when python prompts for trust_remote_code by always (#2664)
specifiying a value.
2024-10-25 06:39:21 +02:00
OlivierDehaene
41c2623735
feat: allow any supported payload on /invocations (#2683)
* feat: allow any supported payload on /invocations

* update openAPI

* update doc
2024-10-23 11:26:01 +00:00
OlivierDehaene
a6a0c97ed9
feat: prefill chunking (#2600)
* wip

* rollback

* refactor to use prefix/postfix namming + fix all_input_ids_tensor

* maybe patching vlms?

* fix filter and concat

* wip, no filter, no concat

* current

* add prepare_for_prefill

* working

* load tested

* re-create slots

* re-create slots

* fix slot_filtering_indices

* feedback loop

* remove log

* fix benchmarker

* fix vlm and seq2seq

* rename to cache and input lengths

* fix prefill logprobs

* fix launcher

* fix logprobs?

* idk at this point

* max input length

* omfg

* remove debugging lines

* fix tests

* fix mllama

* fix cargo tests

* remove support chunking for paged

* Fixing non blocked attentions

* Fixing dtype + AMD, Ipex targets.

* lint fix.

* rename

* Fix prefix_caching variable, remove defaults in server (confusing a lot
of the times).

* Add simple resolution when user specifies ATTENTION=paged.

* Put back non default simple tests.

* Fix env name

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2024-10-16 12:49:33 +02:00
Nicolas Patry
0ff6ff60ad
Hotfixing main (#2556) 2024-09-24 11:51:14 +02:00
OlivierDehaene
10e6f29295
chore: Add old V2 backend (#2551)
* wip

* added v2
2024-09-24 08:38:17 +02:00