mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-09-10 20:04:52 +00:00
Remove unexistent gif
This commit is contained in:
parent
a0b2a09cf3
commit
138ffa2a92
@ -4,7 +4,7 @@
|
|||||||
|
|
||||||
With streaming, the server returns the tokens as the LLM generates them. This enables showing progressive generations to the user rather than waiting for the whole generation. Streaming is an essential aspect of the end-user experience as it reduces latency, one of the most critical aspects of a smooth experience.
|
With streaming, the server returns the tokens as the LLM generates them. This enables showing progressive generations to the user rather than waiting for the whole generation. Streaming is an essential aspect of the end-user experience as it reduces latency, one of the most critical aspects of a smooth experience.
|
||||||
|
|
||||||
<div class="flex justify-center">
|
<div class="flex justify-center" class="block dark:hidden">
|
||||||
<img
|
<img
|
||||||
class="block dark:hidden"
|
class="block dark:hidden"
|
||||||
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/streaming-generation-visual_360.gif"
|
src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/streaming-generation-visual_360.gif"
|
||||||
@ -15,9 +15,6 @@ With streaming, the server returns the tokens as the LLM generates them. This en
|
|||||||
/>
|
/>
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|
||||||

|
|
||||||
|
|
||||||
With token streaming, the server can start returning the tokens before having to wait for the whole generation. Users start to see something happening much earlier than before the work is complete. This has different positive effects:
|
With token streaming, the server can start returning the tokens before having to wait for the whole generation. Users start to see something happening much earlier than before the work is complete. This has different positive effects:
|
||||||
|
|
||||||
* Users can get results orders of magnitude earlier for extremely long queries.
|
* Users can get results orders of magnitude earlier for extremely long queries.
|
||||||
|
Loading…
Reference in New Issue
Block a user