text-generation-inference/router/src
drbh dc5f05f8e6
Pr 3003 ci branch (#3007)
* change ChatCompletionChunk to align with "OpenAI Chat Completions streaming API"

Moving after tool_calls2

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

add in Buffering..

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>

fix: handle usage outside of stream state and add tests

Simplifying everything quite a bit.

Remove the unused model_dump.

Clippy.

Clippy ?

Ruff.

Uppgrade the flake for latest transformers.

Upgrade after rebase.

Remove potential footgun.

Fix completion test.

* Clippy.

* Tweak for multi prompt.

* Ruff.

* Update the snapshot a bit.

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2025-03-10 17:56:19 +01:00
..
infer Fix tool call2 (#3076) 2025-03-07 19:45:57 +01:00
config.rs feat: add initial qwen2.5-vl model and test (#2971) 2025-02-19 12:38:20 +01:00
kserve.rs fix: include add_special_tokens in kserve request (#2859) 2024-12-19 16:55:17 -05:00
lib.rs Pr 3003 ci branch (#3007) 2025-03-10 17:56:19 +01:00
logging.rs Rebase TRT-llm (#2331) 2024-07-31 10:33:10 +02:00
sagemaker.rs feat: allow any supported payload on /invocations (#2683) 2024-10-23 11:26:01 +00:00
server.rs Pr 3003 ci branch (#3007) 2025-03-10 17:56:19 +01:00
usage_stats.rs feat: Add the parsing of HF_HUB_USER_AGENT_ORIGIN environment variable for telemetry (#3027) 2025-02-19 21:09:12 +01:00
validation.rs feat: add initial qwen2.5-vl model and test (#2971) 2025-02-19 12:38:20 +01:00
vertex.rs Improve tool call message processing (#3036) 2025-02-21 10:30:29 +01:00