Commit Graph

1025 Commits

Author SHA1 Message Date
Nicolas Patry
5e2932552c
Revert the Cohere tokenizer change (for now using a revision instead). 2024-08-29 11:35:18 +02:00
Nicolas Patry
fc7ea202c2
Fix disabling prefix caching - Fix windowing checks. 2024-08-29 11:34:50 +02:00
Nicolas Patry
bef2f6bdaa
Fixing the free algorithm to handle times where the common prefix is
smaller.
2024-08-29 09:17:00 +02:00
Nicolas Patry
9c839ca5df
Adding error message when assert is violated. 2024-08-28 21:22:36 +02:00
Nicolas Patry
e7e036389e
Revert the integrationt tests change (seem linked to head_size
modification).
2024-08-28 19:38:51 +02:00
Nicolas Patry
8a4df6e181
Only n_heads / process_group.size() are necessary. 2024-08-28 16:34:58 +02:00
Nicolas Patry
8d01848370
Update server tests
- Default to throughput test in k6
- Use TGI_WIGGLE_ROOM to adjust wiggle room
2024-08-28 15:42:05 +02:00
Nicolas Patry
12325564dc
Put back default pure shell. 2024-08-28 14:54:05 +02:00
Nicolas Patry
f886747949
Oops this doesn't belong here. 2024-08-28 14:49:00 +02:00
Nicolas Patry
e6ee67f301
Truncating left for radix purposes. 2024-08-28 10:53:22 +02:00
Nicolas Patry
0a60973166
Fixing the batching tokenization in flash causal lm. 2024-08-28 10:34:10 +02:00
Nicolas Patry
c6f1a61267
Update the chat test. 2024-08-27 23:02:12 +02:00
Nicolas Patry
8ac1ffa087
Removing encoder_decoder (seq2seq). 2024-08-27 21:11:49 +02:00
Nicolas Patry
ccaf1d0030
Fixing the test. 2024-08-27 20:06:12 +02:00
Nicolas Patry
2cf1f5c00e
Fixing the issue with add_special_tokens not being passed around. 2024-08-27 20:06:12 +02:00
Nicolas Patry
e0069a3a26
Fixing seqlen with the new vlms. 2024-08-27 20:06:12 +02:00
Nicolas Patry
9dacac3b15
add_special_tokens is internal only 2024-08-27 20:06:12 +02:00
Nicolas Patry
55d984d730
Fixed flashinfer version. 2024-08-27 20:06:12 +02:00
Nicolas Patry
bb9769ed42
Update all models. 2024-08-27 20:06:11 +02:00
Nicolas Patry
65b94a69bd
Fixing prefix caching for flashdecoding. 2024-08-27 20:06:11 +02:00
Nicolas Patry
7f1816a4e1
Change add_special_tokens in order to have the correct tokens for chat
input and not (since it's super important with the prefixing now)
2024-08-27 20:06:11 +02:00
Nicolas Patry
f1c0735453
Don't enable prefix caching on VLM just yet. 2024-08-27 20:06:11 +02:00
Nicolas Patry
e30fb25444
Fixing the default for vlm. 2024-08-27 20:06:11 +02:00
Nicolas Patry
27b566baa8
Downgrade some logs. 2024-08-27 20:06:11 +02:00
Nicolas Patry
26e5037de4
This seems to be working. 2024-08-27 20:06:10 +02:00
Nicolas Patry
f5182c188c
Is this enough to make it work ? 2024-08-27 20:06:10 +02:00
Nicolas Patry
1568e82548
OVerride the env in server tests. 2024-08-27 20:06:10 +02:00
Nicolas Patry
682db34b6a
Handling debugger. 2024-08-27 20:06:10 +02:00
Nicolas Patry
c53968dc45
Remove lambda for cleaner function. 2024-08-27 20:06:10 +02:00
Nicolas Patry
32f6416358
Upgrade resolution system for less errors in resolution. 2024-08-27 20:06:10 +02:00
Nicolas Patry
5eb6ea0063
Tmp 2024-08-27 20:06:09 +02:00
Nicolas Patry
0bf4eb9683
Updated flake lock 2024-08-27 20:06:09 +02:00
Nicolas Patry
b80593bfa3
Apply suggestions from code review
Co-authored-by: drbh <david.richard.holtz@gmail.com>
2024-08-27 20:06:09 +02:00
Nicolas Patry
8d0220a695
Forgot last default place. 2024-08-27 20:06:09 +02:00
Nicolas Patry
860b550cdf
Everywhere 1.80 2024-08-27 20:06:09 +02:00
Nicolas Patry
344fee0d44
Upgrade to 1.80 because of bitstream... 2024-08-27 20:06:09 +02:00
Nicolas Patry
17c8a5e574
Update cargo lock ? 2024-08-27 20:06:06 +02:00
Nicolas Patry
ba1ce20ce8
Updating integration tests with new values with FI/FD.
Remove paged as a default too, and using FD everywhere.
2024-08-27 20:05:29 +02:00
Nicolas Patry
ffb6841121
Update lock 2024-08-27 20:05:29 +02:00
Nicolas Patry
f0b35f94b8
More specific codes. 2024-08-27 20:05:29 +02:00
Nicolas Patry
a6cd5fef23
Disable prefix caching for lora. 2024-08-27 20:05:29 +02:00
Nicolas Patry
cba59aca03
Disabling flashinfer/prefix caching on odd head_dim 2024-08-27 20:05:29 +02:00
Nicolas Patry
f55278de2d
Allowing window_left_size (dummy version). 2024-08-27 20:05:29 +02:00
Nicolas Patry
f2bdc65098
Using prebuilt. 2024-08-27 20:05:28 +02:00
Nicolas Patry
9d4c5d39fe
Include flashinfer in the docker. 2024-08-27 20:05:28 +02:00
Nicolas Patry
60719babf6
Making prefix/flashinfer the default and testing the full release tests. 2024-08-27 20:05:28 +02:00
drbh
21187c27c9
fix: bump minijinja version and add test for llama 3.1 tools (#2463)
* fix: support tojson and avoid message indexing issue in template

* fix: prefer minijinja native methods and prefer workspace level dependency

* fix: adjust comment typo
2024-08-27 13:31:08 -04:00
Nicolas Patry
2788d41a76
Fixing CI. (#2462) 2024-08-27 15:33:02 +02:00
drbh
cfa73b5c99
Pr 2451 ci branch (#2454)
* fix[router]: Fix tools not passed in chat template

Signed-off-by: GitHub <noreply@github.com>

* feat: improve default tool serialization and lints

* feat: refactor tool logic to include notify_error in prompt and adjust typing

* fix: adjust non tool template apply

* fix: simplify tool grammar logic and improve schema

* feat: avoid skip tool test and avoid empty tool prompts

* fix: increase test client timeout for grammar compilation tests

---------

Signed-off-by: GitHub <noreply@github.com>
Co-authored-by: Simone Rossi <simone.rossi.93@gmail.com>
2024-08-26 20:19:38 -04:00
drbh
30be188400
Fix: don't apply post layernorm in SiglipVisionTransformer (#2459)
* Fix: don't apply post layernorm in SiglipVisionTransformer

This fixes a bug with LLaVA Next when using Siglip as the vision model. LLaVA Next expects the output of the vision model to be the encoder outputs before layernorm (see original transformers implementation here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava_next/modeling_llava_next.py#L813).

This also makes Siglip consistent with the existing Clip implementation:

https://github.com/huggingface/text-generation-inference/blob/main/server/text_generation_server/models/custom_modeling/clip.py#L613

* fix: adjust pali gemma for post layer norm and small refactors

---------

Co-authored-by: Travis Addair <tgaddair@gmail.com>
2024-08-26 17:04:46 -04:00