Commit Graph

1026 Commits

Author SHA1 Message Date
Nicolas Patry
5838f2139f
Tied embeddings in MLP speculator. 2024-08-29 12:30:26 +02:00
Nicolas Patry
5e2932552c
Revert the Cohere tokenizer change (for now using a revision instead). 2024-08-29 11:35:18 +02:00
Nicolas Patry
fc7ea202c2
Fix disabling prefix caching - Fix windowing checks. 2024-08-29 11:34:50 +02:00
Nicolas Patry
bef2f6bdaa
Fixing the free algorithm to handle times where the common prefix is
smaller.
2024-08-29 09:17:00 +02:00
Nicolas Patry
9c839ca5df
Adding error message when assert is violated. 2024-08-28 21:22:36 +02:00
Nicolas Patry
e7e036389e
Revert the integrationt tests change (seem linked to head_size
modification).
2024-08-28 19:38:51 +02:00
Nicolas Patry
8a4df6e181
Only n_heads / process_group.size() are necessary. 2024-08-28 16:34:58 +02:00
Nicolas Patry
8d01848370
Update server tests
- Default to throughput test in k6
- Use TGI_WIGGLE_ROOM to adjust wiggle room
2024-08-28 15:42:05 +02:00
Nicolas Patry
12325564dc
Put back default pure shell. 2024-08-28 14:54:05 +02:00
Nicolas Patry
f886747949
Oops this doesn't belong here. 2024-08-28 14:49:00 +02:00
Nicolas Patry
e6ee67f301
Truncating left for radix purposes. 2024-08-28 10:53:22 +02:00
Nicolas Patry
0a60973166
Fixing the batching tokenization in flash causal lm. 2024-08-28 10:34:10 +02:00
Nicolas Patry
c6f1a61267
Update the chat test. 2024-08-27 23:02:12 +02:00
Nicolas Patry
8ac1ffa087
Removing encoder_decoder (seq2seq). 2024-08-27 21:11:49 +02:00
Nicolas Patry
ccaf1d0030
Fixing the test. 2024-08-27 20:06:12 +02:00
Nicolas Patry
2cf1f5c00e
Fixing the issue with add_special_tokens not being passed around. 2024-08-27 20:06:12 +02:00
Nicolas Patry
e0069a3a26
Fixing seqlen with the new vlms. 2024-08-27 20:06:12 +02:00
Nicolas Patry
9dacac3b15
add_special_tokens is internal only 2024-08-27 20:06:12 +02:00
Nicolas Patry
55d984d730
Fixed flashinfer version. 2024-08-27 20:06:12 +02:00
Nicolas Patry
bb9769ed42
Update all models. 2024-08-27 20:06:11 +02:00
Nicolas Patry
65b94a69bd
Fixing prefix caching for flashdecoding. 2024-08-27 20:06:11 +02:00
Nicolas Patry
7f1816a4e1
Change add_special_tokens in order to have the correct tokens for chat
input and not (since it's super important with the prefixing now)
2024-08-27 20:06:11 +02:00
Nicolas Patry
f1c0735453
Don't enable prefix caching on VLM just yet. 2024-08-27 20:06:11 +02:00
Nicolas Patry
e30fb25444
Fixing the default for vlm. 2024-08-27 20:06:11 +02:00
Nicolas Patry
27b566baa8
Downgrade some logs. 2024-08-27 20:06:11 +02:00
Nicolas Patry
26e5037de4
This seems to be working. 2024-08-27 20:06:10 +02:00
Nicolas Patry
f5182c188c
Is this enough to make it work ? 2024-08-27 20:06:10 +02:00
Nicolas Patry
1568e82548
OVerride the env in server tests. 2024-08-27 20:06:10 +02:00
Nicolas Patry
682db34b6a
Handling debugger. 2024-08-27 20:06:10 +02:00
Nicolas Patry
c53968dc45
Remove lambda for cleaner function. 2024-08-27 20:06:10 +02:00
Nicolas Patry
32f6416358
Upgrade resolution system for less errors in resolution. 2024-08-27 20:06:10 +02:00
Nicolas Patry
5eb6ea0063
Tmp 2024-08-27 20:06:09 +02:00
Nicolas Patry
0bf4eb9683
Updated flake lock 2024-08-27 20:06:09 +02:00
Nicolas Patry
b80593bfa3
Apply suggestions from code review
Co-authored-by: drbh <david.richard.holtz@gmail.com>
2024-08-27 20:06:09 +02:00
Nicolas Patry
8d0220a695
Forgot last default place. 2024-08-27 20:06:09 +02:00
Nicolas Patry
860b550cdf
Everywhere 1.80 2024-08-27 20:06:09 +02:00
Nicolas Patry
344fee0d44
Upgrade to 1.80 because of bitstream... 2024-08-27 20:06:09 +02:00
Nicolas Patry
17c8a5e574
Update cargo lock ? 2024-08-27 20:06:06 +02:00
Nicolas Patry
ba1ce20ce8
Updating integration tests with new values with FI/FD.
Remove paged as a default too, and using FD everywhere.
2024-08-27 20:05:29 +02:00
Nicolas Patry
ffb6841121
Update lock 2024-08-27 20:05:29 +02:00
Nicolas Patry
f0b35f94b8
More specific codes. 2024-08-27 20:05:29 +02:00
Nicolas Patry
a6cd5fef23
Disable prefix caching for lora. 2024-08-27 20:05:29 +02:00
Nicolas Patry
cba59aca03
Disabling flashinfer/prefix caching on odd head_dim 2024-08-27 20:05:29 +02:00
Nicolas Patry
f55278de2d
Allowing window_left_size (dummy version). 2024-08-27 20:05:29 +02:00
Nicolas Patry
f2bdc65098
Using prebuilt. 2024-08-27 20:05:28 +02:00
Nicolas Patry
9d4c5d39fe
Include flashinfer in the docker. 2024-08-27 20:05:28 +02:00
Nicolas Patry
60719babf6
Making prefix/flashinfer the default and testing the full release tests. 2024-08-27 20:05:28 +02:00
drbh
21187c27c9
fix: bump minijinja version and add test for llama 3.1 tools (#2463)
* fix: support tojson and avoid message indexing issue in template

* fix: prefer minijinja native methods and prefer workspace level dependency

* fix: adjust comment typo
2024-08-27 13:31:08 -04:00
Nicolas Patry
2788d41a76
Fixing CI. (#2462) 2024-08-27 15:33:02 +02:00
drbh
cfa73b5c99
Pr 2451 ci branch (#2454)
* fix[router]: Fix tools not passed in chat template

Signed-off-by: GitHub <noreply@github.com>

* feat: improve default tool serialization and lints

* feat: refactor tool logic to include notify_error in prompt and adjust typing

* fix: adjust non tool template apply

* fix: simplify tool grammar logic and improve schema

* feat: avoid skip tool test and avoid empty tool prompts

* fix: increase test client timeout for grammar compilation tests

---------

Signed-off-by: GitHub <noreply@github.com>
Co-authored-by: Simone Rossi <simone.rossi.93@gmail.com>
2024-08-26 20:19:38 -04:00