Corentin REGAL
b4187d6022
Add tgi_batch_current_size and tgi_batch_current_size as response header
2025-01-17 15:48:02 +01:00
Nicolas Patry
203cade244
Upgrading our rustc version. ( #2908 )
...
* Upgrading our rustc version.
* Fixing the rust tests to proper version.
* Clippy everything.
2025-01-15 17:04:03 +01:00
Dmitry Dygalo
01067f8ba8
chore: Update jsonschema to 0.28.0 ( #2870 )
...
* chore: Update jsonschema to 0.28.0
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
* chore: Enable blocking feature for reqwest
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
---------
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
2025-01-10 15:01:54 +01:00
Nicolas Patry
82c24f7420
Using both value from config as they might not be correct. ( #2817 )
...
* Using both value from config as they might not be correct.
* Fixing max_position_embeddings for falcon.
* Simple attempt to fix the healthcheck block allocation.
* Much simpler solution.
* Default value for Backend start_health
2024-12-10 19:37:09 +01:00
OlivierDehaene
8c3669b287
feat: auto max_new_tokens ( #2803 )
...
* feat: auto max_new_tokens
* update default
* Fixing the tests.
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2024-12-06 05:50:35 +01:00
OlivierDehaene
ab7ccf5bc3
feat: add payload limit ( #2726 )
...
* feat: add payload limit
* update launcher
2024-11-21 18:20:15 +00:00
Nicolas Patry
ed87b464b4
Fixing "deadlock" when python prompts for trust_remote_code by always ( #2664 )
...
specifiying a value.
2024-10-25 06:39:21 +02:00
OlivierDehaene
41c2623735
feat: allow any supported payload on /invocations ( #2683 )
...
* feat: allow any supported payload on /invocations
* update openAPI
* update doc
2024-10-23 11:26:01 +00:00
OlivierDehaene
a6a0c97ed9
feat: prefill chunking ( #2600 )
...
* wip
* rollback
* refactor to use prefix/postfix namming + fix all_input_ids_tensor
* maybe patching vlms?
* fix filter and concat
* wip, no filter, no concat
* current
* add prepare_for_prefill
* working
* load tested
* re-create slots
* re-create slots
* fix slot_filtering_indices
* feedback loop
* remove log
* fix benchmarker
* fix vlm and seq2seq
* rename to cache and input lengths
* fix prefill logprobs
* fix launcher
* fix logprobs?
* idk at this point
* max input length
* omfg
* remove debugging lines
* fix tests
* fix mllama
* fix cargo tests
* remove support chunking for paged
* Fixing non blocked attentions
* Fixing dtype + AMD, Ipex targets.
* lint fix.
* rename
* Fix prefix_caching variable, remove defaults in server (confusing a lot
of the times).
* Add simple resolution when user specifies ATTENTION=paged.
* Put back non default simple tests.
* Fix env name
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2024-10-16 12:49:33 +02:00
Nicolas Patry
0ff6ff60ad
Hotfixing main ( #2556 )
2024-09-24 11:51:14 +02:00
OlivierDehaene
10e6f29295
chore: Add old V2 backend ( #2551 )
...
* wip
* added v2
2024-09-24 08:38:17 +02:00