Nicolas Patry
|
5e2932552c
|
Revert the Cohere tokenizer change (for now using a revision instead).
|
2024-08-29 11:35:18 +02:00 |
|
Nicolas Patry
|
fc7ea202c2
|
Fix disabling prefix caching - Fix windowing checks.
|
2024-08-29 11:34:50 +02:00 |
|
Nicolas Patry
|
bef2f6bdaa
|
Fixing the free algorithm to handle times where the common prefix is
smaller.
|
2024-08-29 09:17:00 +02:00 |
|
Nicolas Patry
|
9c839ca5df
|
Adding error message when assert is violated.
|
2024-08-28 21:22:36 +02:00 |
|
Nicolas Patry
|
e7e036389e
|
Revert the integrationt tests change (seem linked to head_size
modification).
|
2024-08-28 19:38:51 +02:00 |
|
Nicolas Patry
|
8a4df6e181
|
Only n_heads / process_group.size() are necessary.
|
2024-08-28 16:34:58 +02:00 |
|
Nicolas Patry
|
8d01848370
|
Update server tests
- Default to throughput test in k6
- Use TGI_WIGGLE_ROOM to adjust wiggle room
|
2024-08-28 15:42:05 +02:00 |
|
Nicolas Patry
|
12325564dc
|
Put back default pure shell.
|
2024-08-28 14:54:05 +02:00 |
|
Nicolas Patry
|
f886747949
|
Oops this doesn't belong here.
|
2024-08-28 14:49:00 +02:00 |
|
Nicolas Patry
|
e6ee67f301
|
Truncating left for radix purposes.
|
2024-08-28 10:53:22 +02:00 |
|
Nicolas Patry
|
0a60973166
|
Fixing the batching tokenization in flash causal lm.
|
2024-08-28 10:34:10 +02:00 |
|
Nicolas Patry
|
c6f1a61267
|
Update the chat test.
|
2024-08-27 23:02:12 +02:00 |
|
Nicolas Patry
|
8ac1ffa087
|
Removing encoder_decoder (seq2seq).
|
2024-08-27 21:11:49 +02:00 |
|
Nicolas Patry
|
ccaf1d0030
|
Fixing the test.
|
2024-08-27 20:06:12 +02:00 |
|
Nicolas Patry
|
2cf1f5c00e
|
Fixing the issue with add_special_tokens not being passed around.
|
2024-08-27 20:06:12 +02:00 |
|
Nicolas Patry
|
e0069a3a26
|
Fixing seqlen with the new vlms.
|
2024-08-27 20:06:12 +02:00 |
|
Nicolas Patry
|
9dacac3b15
|
add_special_tokens is internal only
|
2024-08-27 20:06:12 +02:00 |
|
Nicolas Patry
|
55d984d730
|
Fixed flashinfer version.
|
2024-08-27 20:06:12 +02:00 |
|
Nicolas Patry
|
bb9769ed42
|
Update all models.
|
2024-08-27 20:06:11 +02:00 |
|
Nicolas Patry
|
65b94a69bd
|
Fixing prefix caching for flashdecoding.
|
2024-08-27 20:06:11 +02:00 |
|
Nicolas Patry
|
7f1816a4e1
|
Change add_special_tokens in order to have the correct tokens for chat
input and not (since it's super important with the prefixing now)
|
2024-08-27 20:06:11 +02:00 |
|
Nicolas Patry
|
f1c0735453
|
Don't enable prefix caching on VLM just yet.
|
2024-08-27 20:06:11 +02:00 |
|
Nicolas Patry
|
e30fb25444
|
Fixing the default for vlm.
|
2024-08-27 20:06:11 +02:00 |
|
Nicolas Patry
|
27b566baa8
|
Downgrade some logs.
|
2024-08-27 20:06:11 +02:00 |
|
Nicolas Patry
|
26e5037de4
|
This seems to be working.
|
2024-08-27 20:06:10 +02:00 |
|
Nicolas Patry
|
f5182c188c
|
Is this enough to make it work ?
|
2024-08-27 20:06:10 +02:00 |
|
Nicolas Patry
|
1568e82548
|
OVerride the env in server tests.
|
2024-08-27 20:06:10 +02:00 |
|
Nicolas Patry
|
682db34b6a
|
Handling debugger.
|
2024-08-27 20:06:10 +02:00 |
|
Nicolas Patry
|
c53968dc45
|
Remove lambda for cleaner function.
|
2024-08-27 20:06:10 +02:00 |
|
Nicolas Patry
|
32f6416358
|
Upgrade resolution system for less errors in resolution.
|
2024-08-27 20:06:10 +02:00 |
|
Nicolas Patry
|
5eb6ea0063
|
Tmp
|
2024-08-27 20:06:09 +02:00 |
|
Nicolas Patry
|
0bf4eb9683
|
Updated flake lock
|
2024-08-27 20:06:09 +02:00 |
|
Nicolas Patry
|
b80593bfa3
|
Apply suggestions from code review
Co-authored-by: drbh <david.richard.holtz@gmail.com>
|
2024-08-27 20:06:09 +02:00 |
|
Nicolas Patry
|
8d0220a695
|
Forgot last default place.
|
2024-08-27 20:06:09 +02:00 |
|
Nicolas Patry
|
860b550cdf
|
Everywhere 1.80
|
2024-08-27 20:06:09 +02:00 |
|
Nicolas Patry
|
344fee0d44
|
Upgrade to 1.80 because of bitstream...
|
2024-08-27 20:06:09 +02:00 |
|
Nicolas Patry
|
17c8a5e574
|
Update cargo lock ?
|
2024-08-27 20:06:06 +02:00 |
|
Nicolas Patry
|
ba1ce20ce8
|
Updating integration tests with new values with FI/FD.
Remove paged as a default too, and using FD everywhere.
|
2024-08-27 20:05:29 +02:00 |
|
Nicolas Patry
|
ffb6841121
|
Update lock
|
2024-08-27 20:05:29 +02:00 |
|
Nicolas Patry
|
f0b35f94b8
|
More specific codes.
|
2024-08-27 20:05:29 +02:00 |
|
Nicolas Patry
|
a6cd5fef23
|
Disable prefix caching for lora.
|
2024-08-27 20:05:29 +02:00 |
|
Nicolas Patry
|
cba59aca03
|
Disabling flashinfer/prefix caching on odd head_dim
|
2024-08-27 20:05:29 +02:00 |
|
Nicolas Patry
|
f55278de2d
|
Allowing window_left_size (dummy version).
|
2024-08-27 20:05:29 +02:00 |
|
Nicolas Patry
|
f2bdc65098
|
Using prebuilt.
|
2024-08-27 20:05:28 +02:00 |
|
Nicolas Patry
|
9d4c5d39fe
|
Include flashinfer in the docker.
|
2024-08-27 20:05:28 +02:00 |
|
Nicolas Patry
|
60719babf6
|
Making prefix/flashinfer the default and testing the full release tests.
|
2024-08-27 20:05:28 +02:00 |
|
drbh
|
21187c27c9
|
fix: bump minijinja version and add test for llama 3.1 tools (#2463)
* fix: support tojson and avoid message indexing issue in template
* fix: prefer minijinja native methods and prefer workspace level dependency
* fix: adjust comment typo
|
2024-08-27 13:31:08 -04:00 |
|
Nicolas Patry
|
2788d41a76
|
Fixing CI. (#2462)
|
2024-08-27 15:33:02 +02:00 |
|
drbh
|
cfa73b5c99
|
Pr 2451 ci branch (#2454)
* fix[router]: Fix tools not passed in chat template
Signed-off-by: GitHub <noreply@github.com>
* feat: improve default tool serialization and lints
* feat: refactor tool logic to include notify_error in prompt and adjust typing
* fix: adjust non tool template apply
* fix: simplify tool grammar logic and improve schema
* feat: avoid skip tool test and avoid empty tool prompts
* fix: increase test client timeout for grammar compilation tests
---------
Signed-off-by: GitHub <noreply@github.com>
Co-authored-by: Simone Rossi <simone.rossi.93@gmail.com>
|
2024-08-26 20:19:38 -04:00 |
|
drbh
|
30be188400
|
Fix: don't apply post layernorm in SiglipVisionTransformer (#2459)
* Fix: don't apply post layernorm in SiglipVisionTransformer
This fixes a bug with LLaVA Next when using Siglip as the vision model. LLaVA Next expects the output of the vision model to be the encoder outputs before layernorm (see original transformers implementation here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/llava_next/modeling_llava_next.py#L813).
This also makes Siglip consistent with the existing Clip implementation:
https://github.com/huggingface/text-generation-inference/blob/main/server/text_generation_server/models/custom_modeling/clip.py#L613
* fix: adjust pali gemma for post layer norm and small refactors
---------
Co-authored-by: Travis Addair <tgaddair@gmail.com>
|
2024-08-26 17:04:46 -04:00 |
|