Commit Graph

76 Commits

Author SHA1 Message Date
OlivierDehaene
db4cb5e4ed
fix(server): fix past key values logic ()
@njhill fyi
2023-04-21 15:59:18 +02:00
OlivierDehaene
343437c7b5
feat(router): add device and dtype info () 2023-04-21 15:36:29 +02:00
Nick Hill
ac8c0f6fe4
feat(server): flash attention past key value optimizations () 2023-04-21 14:57:18 +02:00
OlivierDehaene
709d8936f6
feat(router): drop requests when client closes the channel () 2023-04-20 11:07:40 +02:00
OlivierDehaene
b6ee0ec7b0
feat(router): add git sha to info route () 2023-04-19 21:36:59 +02:00
OlivierDehaene
a88c54bb4c
feat(server): check cuda capability when importing flash models ()
close 
2023-04-19 12:52:37 +02:00
OlivierDehaene
e14ae3b5e9
feat(server): support quantization for flash models ()
closes 
2023-04-19 12:51:11 +02:00
OlivierDehaene
7a1ba58557
fix(docker): fix docker image dependencies () 2023-04-17 00:26:47 +02:00
OlivierDehaene
880a76eed5
feat(server): support sharded santacoder () 2023-04-12 17:18:08 +02:00
OlivierDehaene
5fa8ae041c
feat(server): optimize decode for sane tokenizers () 2023-04-12 12:03:10 +02:00
OlivierDehaene
f26dfd0dc1
feat(server): support OPT models ()
OPT models do not all have a `tokenizer.json` file on the hub at the
moment. Can't merge for now.
2023-04-11 19:16:41 +02:00
OlivierDehaene
299217c95c
feat(server): add flash attention llama () 2023-04-11 16:38:22 +02:00
OlivierDehaene
9987960062
feat(router): make router input validation optional () 2023-04-09 20:22:27 +02:00
OlivierDehaene
3f2542bb6a
fix(server): fix escape characters in stop sequence () 2023-04-05 19:37:41 +02:00
OlivierDehaene
c0aeb32583
feat(server): flash santacoder () 2023-04-03 19:06:42 +02:00
OlivierDehaene
08b7e4a282
fix(server): fix flash neox rotary embeddings () 2023-03-30 16:12:23 +02:00
OlivierDehaene
c9bdaa8b73
feat(server): reduce mlp and attn in one op for flash neox () 2023-03-28 16:51:41 +02:00
Nick Hill
462530c2b0
fix(server): Avoid using try/except to determine kind of AutoModel () 2023-03-27 09:23:22 +02:00
OlivierDehaene
678b2f3900
feat(server): cleanup flash neox loading () 2023-03-26 16:37:21 +02:00
OlivierDehaene
d6a93fe992
fix(server): fix flash-neox scores warping () 2023-03-24 18:21:41 +01:00
OlivierDehaene
05e9a796cc
feat(server): flash neoX () 2023-03-24 14:02:14 +01:00
OlivierDehaene
b49dbf2d88
fix(server): use server tokenizer as gt () 2023-03-16 12:12:26 +01:00
OlivierDehaene
8ad60b752f
fix(server): add position ids to neox () 2023-03-15 13:12:49 +01:00
OlivierDehaene
941cd42e0c
fix(server): fix index out of range for watermarking () 2023-03-08 18:29:08 +01:00
OlivierDehaene
b1485e18c5
fix(server): fix galactica batch ()
closes 
2023-03-07 20:05:21 +01:00
OlivierDehaene
3fef90d50f
feat(clients): Python client () 2023-03-07 18:52:22 +01:00