text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-04-26 12:32:10 +00:00

Author	SHA1	Message	Date
Nicolas Patry	5e2932552c	Revert the Cohere tokenizer change (for now using a revision instead).	2024-08-29 11:35:18 +02:00
Nicolas Patry	fc7ea202c2	Fix disabling prefix caching - Fix windowing checks.	2024-08-29 11:34:50 +02:00
Nicolas Patry	bef2f6bdaa	Fixing the free algorithm to handle times where the common prefix is smaller.	2024-08-29 09:17:00 +02:00
Nicolas Patry	9c839ca5df	Adding error message when assert is violated.	2024-08-28 21:22:36 +02:00
Nicolas Patry	e7e036389e	Revert the integrationt tests change (seem linked to head_size modification).	2024-08-28 19:38:51 +02:00
Nicolas Patry	8a4df6e181	Only n_heads / process_group.size() are necessary.	2024-08-28 16:34:58 +02:00
Nicolas Patry	8d01848370	Update server tests - Default to throughput test in k6 - Use TGI_WIGGLE_ROOM to adjust wiggle room	2024-08-28 15:42:05 +02:00
Nicolas Patry	12325564dc	Put back default pure shell.	2024-08-28 14:54:05 +02:00
Nicolas Patry	f886747949	Oops this doesn't belong here.	2024-08-28 14:49:00 +02:00
Nicolas Patry	e6ee67f301	Truncating left for radix purposes.	2024-08-28 10:53:22 +02:00
Nicolas Patry	0a60973166	Fixing the batching tokenization in flash causal lm.	2024-08-28 10:34:10 +02:00
Nicolas Patry	c6f1a61267	Update the chat test.	2024-08-27 23:02:12 +02:00
Nicolas Patry	8ac1ffa087	Removing encoder_decoder (seq2seq).	2024-08-27 21:11:49 +02:00
Nicolas Patry	ccaf1d0030	Fixing the test.	2024-08-27 20:06:12 +02:00
Nicolas Patry	2cf1f5c00e	Fixing the issue with `add_special_tokens` not being passed around.	2024-08-27 20:06:12 +02:00
Nicolas Patry	e0069a3a26	Fixing seqlen with the new vlms.	2024-08-27 20:06:12 +02:00
Nicolas Patry	9dacac3b15	add_special_tokens is internal only	2024-08-27 20:06:12 +02:00
Nicolas Patry	55d984d730	Fixed flashinfer version.	2024-08-27 20:06:12 +02:00
Nicolas Patry	bb9769ed42	Update all models.	2024-08-27 20:06:11 +02:00
Nicolas Patry	65b94a69bd	Fixing prefix caching for flashdecoding.	2024-08-27 20:06:11 +02:00
Nicolas Patry	7f1816a4e1	Change `add_special_tokens` in order to have the correct tokens for chat input and not (since it's super important with the prefixing now)	2024-08-27 20:06:11 +02:00
Nicolas Patry	f1c0735453	Don't enable prefix caching on VLM just yet.	2024-08-27 20:06:11 +02:00
Nicolas Patry	e30fb25444	Fixing the default for vlm.	2024-08-27 20:06:11 +02:00
Nicolas Patry	27b566baa8	Downgrade some logs.	2024-08-27 20:06:11 +02:00
Nicolas Patry	26e5037de4	This seems to be working.	2024-08-27 20:06:10 +02:00
Nicolas Patry	f5182c188c	Is this enough to make it work ?	2024-08-27 20:06:10 +02:00
Nicolas Patry	1568e82548	OVerride the env in server tests.	2024-08-27 20:06:10 +02:00
Nicolas Patry	682db34b6a	Handling debugger.	2024-08-27 20:06:10 +02:00
Nicolas Patry	c53968dc45	Remove lambda for cleaner function.	2024-08-27 20:06:10 +02:00
Nicolas Patry	32f6416358	Upgrade resolution system for less errors in resolution.	2024-08-27 20:06:10 +02:00
Nicolas Patry	5eb6ea0063	Tmp	2024-08-27 20:06:09 +02:00
Nicolas Patry	0bf4eb9683	Updated flake lock	2024-08-27 20:06:09 +02:00
Nicolas Patry	b80593bfa3	Apply suggestions from code review Co-authored-by: drbh <david.richard.holtz@gmail.com>	2024-08-27 20:06:09 +02:00
Nicolas Patry	8d0220a695	Forgot last default place.	2024-08-27 20:06:09 +02:00
Nicolas Patry	860b550cdf	Everywhere 1.80	2024-08-27 20:06:09 +02:00
Nicolas Patry	344fee0d44	Upgrade to 1.80 because of bitstream...	2024-08-27 20:06:09 +02:00
Nicolas Patry	17c8a5e574	Update cargo lock ?	2024-08-27 20:06:06 +02:00
Nicolas Patry	ba1ce20ce8	Updating integration tests with new values with FI/FD. Remove paged as a default too, and using FD everywhere.	2024-08-27 20:05:29 +02:00
Nicolas Patry	ffb6841121	Update lock	2024-08-27 20:05:29 +02:00
Nicolas Patry	f0b35f94b8	More specific codes.	2024-08-27 20:05:29 +02:00
Nicolas Patry	a6cd5fef23	Disable prefix caching for lora.	2024-08-27 20:05:29 +02:00
Nicolas Patry	cba59aca03	Disabling flashinfer/prefix caching on odd head_dim	2024-08-27 20:05:29 +02:00
Nicolas Patry	f55278de2d	Allowing window_left_size (dummy version).	2024-08-27 20:05:29 +02:00
Nicolas Patry	f2bdc65098	Using prebuilt.	2024-08-27 20:05:28 +02:00
Nicolas Patry	9d4c5d39fe	Include flashinfer in the docker.	2024-08-27 20:05:28 +02:00
Nicolas Patry	60719babf6	Making prefix/flashinfer the default and testing the full release tests.	2024-08-27 20:05:28 +02:00
drbh	21187c27c9	fix: bump minijinja version and add test for llama 3.1 tools (#2463 ) * fix: support tojson and avoid message indexing issue in template * fix: prefer minijinja native methods and prefer workspace level dependency * fix: adjust comment typo	2024-08-27 13:31:08 -04:00
Nicolas Patry	2788d41a76	Fixing CI. (#2462 )	2024-08-27 15:33:02 +02:00
drbh	cfa73b5c99	Pr 2451 ci branch (#2454 ) * fix[router]: Fix tools not passed in chat template Signed-off-by: GitHub <noreply@github.com> * feat: improve default tool serialization and lints * feat: refactor tool logic to include notify_error in prompt and adjust typing * fix: adjust non tool template apply * fix: simplify tool grammar logic and improve schema * feat: avoid skip tool test and avoid empty tool prompts * fix: increase test client timeout for grammar compilation tests --------- Signed-off-by: GitHub <noreply@github.com> Co-authored-by: Simone Rossi <simone.rossi.93@gmail.com>	2024-08-26 20:19:38 -04:00
drbh	30be188400	Fix: don't apply post layernorm in SiglipVisionTransformer (#2459 ) * Fix: don't apply post layernorm in SiglipVisionTransformer This fixes a bug with LLaVA Next when using Siglip as the vision model. LLaVA Next expects the output of the vision model to be the encoder outputs before layernorm (see original transformers implementation here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava_next/modeling_llava_next.py#L813). This also makes Siglip consistent with the existing Clip implementation: https://github.com/huggingface/text-generation-inference/blob/main/server/text_generation_server/models/custom_modeling/clip.py#L613 * fix: adjust pali gemma for post layer norm and small refactors --------- Co-authored-by: Travis Addair <tgaddair@gmail.com>	2024-08-26 17:04:46 -04:00

1 2 3 4 5 ...

1025 Commits