Update streaming.md

2025-09-10 20:04:52 +00:00 · 2023-08-17 17:27:06 +02:00 · 2023-08-17 17:27:06 +02:00 · a361cd2b53
commit a361cd2b53
parent 2248dd8e18
1 changed files with 2 additions and 1 deletions
--- a/docs/source/conceptual/streaming.md
+++ b/docs/source/conceptual/streaming.md
@ -20,6 +20,7 @@ With token streaming, the server can start returning the tokens one by one befor
 * Users can get results orders of magnitude earlier for extremely long queries.
 * Seeing something in progress allows users to stop the generation if it's not going in the direction they expect.
 * Perceived latency is lower when results are shown in the early stages.
+* When used in conversational UIs, the experience feels more natural.

 For example, a system can generate 100 tokens per second. If the system generates 1000 tokens, with the non-streaming setup, users need to wait 10 seconds to get results. On the other hand, with the streaming setup, users get initial results immediately, and although end-to-end latency will be the same, they can see half of the generation after five seconds. Below you can see an interactive demo that shows non-streaming vs streaming side-by-side. Click **generate** below.