Commit Graph

8 Commits

Author SHA1 Message Date
OlivierDehaene
343437c7b5
feat(router): add device and dtype info (#215) 2023-04-21 15:36:29 +02:00
Nick Hill
ac8c0f6fe4
feat(server): flash attention past key value optimizations (#213) 2023-04-21 14:57:18 +02:00
OlivierDehaene
709d8936f6
feat(router): drop requests when client closes the channel (#202) 2023-04-20 11:07:40 +02:00
OlivierDehaene
e14ae3b5e9
feat(server): support quantization for flash models (#200)
closes #197
2023-04-19 12:51:11 +02:00
OlivierDehaene
5fa8ae041c
feat(server): optimize decode for sane tokenizers (#170) 2023-04-12 12:03:10 +02:00
OlivierDehaene
299217c95c
feat(server): add flash attention llama (#144) 2023-04-11 16:38:22 +02:00
OlivierDehaene
9987960062
feat(router): make router input validation optional (#164) 2023-04-09 20:22:27 +02:00
OlivierDehaene
c0aeb32583
feat(server): flash santacoder (#153) 2023-04-03 19:06:42 +02:00