From 138ffa2a925377d729a6e879b57305809f81d484 Mon Sep 17 00:00:00 2001 From: osanseviero Date: Wed, 16 Aug 2023 18:17:48 +0200 Subject: [PATCH] Remove unexistent gif --- docs/source/conceptual/streaming.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/docs/source/conceptual/streaming.md b/docs/source/conceptual/streaming.md index f58eb2e6..a74db33b 100644 --- a/docs/source/conceptual/streaming.md +++ b/docs/source/conceptual/streaming.md @@ -4,7 +4,7 @@ With streaming, the server returns the tokens as the LLM generates them. This enables showing progressive generations to the user rather than waiting for the whole generation. Streaming is an essential aspect of the end-user experience as it reduces latency, one of the most critical aspects of a smooth experience. -
+
- -![A diff of streaming vs non streaming](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/streaming-generation-visual.gif) - With token streaming, the server can start returning the tokens before having to wait for the whole generation. Users start to see something happening much earlier than before the work is complete. This has different positive effects: * Users can get results orders of magnitude earlier for extremely long queries.