text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-10 15:35:24 +00:00

History

Daniël de Kok df71aafdcc router: send the input as chunks to the backend Before this change, the generation input was sent to the backend as a single string, encoding images as Base64 and packing them in Markdown-style links. This change adds a new chunked input representation that separates text chunks from images chunks. Image chunks contain binary data (for smaller message sizes) and the image's MIME type. The stringly-typed inputs are still sent to support backends that do not support chunked inputs yet.		2024-06-03 17:02:41 +02:00
..
config.rs	router: send the input as chunks to the backend	2024-06-03 17:02:41 +02:00
health.rs	router: send the input as chunks to the backend	2024-06-03 17:02:41 +02:00
infer.rs	Purely refactors paged/attention into `layers/attention` and make hardware differences more obvious with 1 file per hardware. (#1986 )	2024-05-31 17:57:01 +02:00
lib.rs	Processor config chat template (#1954 )	2024-05-27 16:03:16 +02:00
main.rs	Processor config chat template (#1954 )	2024-05-27 16:03:16 +02:00
queue.rs	router: send the input as chunks to the backend	2024-06-03 17:02:41 +02:00
server.rs	Fixing the text part from tokenizer endpoint. (#1967 )	2024-05-28 16:55:36 +02:00
validation.rs	router: send the input as chunks to the backend	2024-06-03 17:02:41 +02:00