Commit Graph

541 Commits

Author SHA1 Message Date
OlivierDehaene
4096000e34
fix(server): fix typo in tokenizers decode ()
closes 
2023-05-03 10:10:34 +02:00
Ehsan M. Kermani
f092ba9b22
feat(server): add watermarking tests () 2023-04-27 19:16:35 +02:00
Nick Hill
34bca0b8d3
fix(server): Small tidy of code from recent changes ()
remaining_decode_tokens was calculated twice in Seq2SeqLMBatch.filter()
2023-04-27 09:57:28 +02:00
Nick Hill
b4cf832c40
fix(server): fix reshaping of bloom past_key_values in concatenate() ()
Introduced in  

Fixes 
2023-04-27 09:51:27 +02:00
Nicolas Patry
db2b4e0754
feat(router): new healthcheck that skips the queue ()
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2023-04-26 20:23:54 +02:00
OlivierDehaene
ebc74d5666
feat(router): use number of tokens in batch as input for dynamic batching ()
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-04-24 17:59:00 +02:00
Nick Hill
4a7dd4085a
feat(server): reduce memory requirement () 2023-04-24 14:15:42 +02:00
OlivierDehaene
4b460e72fb
fix(server): fix flash batch filtering () 2023-04-21 20:26:01 +02:00
OlivierDehaene
1ffea36ec2
fix(server): fix flash causal () 2023-04-21 19:49:08 +02:00
OlivierDehaene
86bca365df
fix(server): fix flash causal () 2023-04-21 19:42:16 +02:00
OlivierDehaene
afc5b999d0
fix(server): cleanup new flash past_key_values logic () 2023-04-21 16:19:04 +02:00
OlivierDehaene
db4cb5e4ed
fix(server): fix past key values logic ()
@njhill fyi
2023-04-21 15:59:18 +02:00
OlivierDehaene
343437c7b5
feat(router): add device and dtype info () 2023-04-21 15:36:29 +02:00
Nick Hill
ac8c0f6fe4
feat(server): flash attention past key value optimizations () 2023-04-21 14:57:18 +02:00
OlivierDehaene
709d8936f6
feat(router): drop requests when client closes the channel () 2023-04-20 11:07:40 +02:00
OlivierDehaene
b6ee0ec7b0
feat(router): add git sha to info route () 2023-04-19 21:36:59 +02:00
OlivierDehaene
a88c54bb4c
feat(server): check cuda capability when importing flash models ()
close 
2023-04-19 12:52:37 +02:00
OlivierDehaene
e14ae3b5e9
feat(server): support quantization for flash models ()
closes 
2023-04-19 12:51:11 +02:00
OlivierDehaene
7a1ba58557
fix(docker): fix docker image dependencies () 2023-04-17 00:26:47 +02:00
OlivierDehaene
880a76eed5
feat(server): support sharded santacoder () 2023-04-12 17:18:08 +02:00
OlivierDehaene
5fa8ae041c
feat(server): optimize decode for sane tokenizers () 2023-04-12 12:03:10 +02:00
OlivierDehaene
f26dfd0dc1
feat(server): support OPT models ()
OPT models do not all have a `tokenizer.json` file on the hub at the
moment. Can't merge for now.
2023-04-11 19:16:41 +02:00
OlivierDehaene
299217c95c
feat(server): add flash attention llama () 2023-04-11 16:38:22 +02:00
OlivierDehaene
9987960062
feat(router): make router input validation optional () 2023-04-09 20:22:27 +02:00
OlivierDehaene
3f2542bb6a
fix(server): fix escape characters in stop sequence () 2023-04-05 19:37:41 +02:00
OlivierDehaene
c0aeb32583
feat(server): flash santacoder () 2023-04-03 19:06:42 +02:00
OlivierDehaene
08b7e4a282
fix(server): fix flash neox rotary embeddings () 2023-03-30 16:12:23 +02:00
OlivierDehaene
610bb1f978
feat(benchmark): tui based benchmarking tool () 2023-03-30 15:26:27 +02:00
OlivierDehaene
c9bdaa8b73
feat(server): reduce mlp and attn in one op for flash neox () 2023-03-28 16:51:41 +02:00
OlivierDehaene
f000068944
feat(server): clear cache on error () 2023-03-28 11:29:35 +02:00
Nick Hill
462530c2b0
fix(server): Avoid using try/except to determine kind of AutoModel () 2023-03-27 09:23:22 +02:00
OlivierDehaene
678b2f3900
feat(server): cleanup flash neox loading () 2023-03-26 16:37:21 +02:00
OlivierDehaene
d6a93fe992
fix(server): fix flash-neox scores warping () 2023-03-24 18:21:41 +01:00
OlivierDehaene
05e9a796cc
feat(server): flash neoX () 2023-03-24 14:02:14 +01:00
OlivierDehaene
b49dbf2d88
fix(server): use server tokenizer as gt () 2023-03-16 12:12:26 +01:00
OlivierDehaene
8ad60b752f
fix(server): add position ids to neox () 2023-03-15 13:12:49 +01:00
OlivierDehaene
c0795de2f2
fix(server): do not warp prefill logits () 2023-03-09 13:00:10 +01:00
OlivierDehaene
1a2d68250a
feat: support typical sampling ()
closes 
2023-03-09 11:33:57 +01:00
OlivierDehaene
941cd42e0c
fix(server): fix index out of range for watermarking () 2023-03-08 18:29:08 +01:00
OlivierDehaene
b1485e18c5
fix(server): fix galactica batch ()
closes 
2023-03-07 20:05:21 +01:00
OlivierDehaene
3fef90d50f
feat(clients): Python client () 2023-03-07 18:52:22 +01:00