OlivierDehaene
|
ebc74d5666
|
feat(router): use number of tokens in batch as input for dynamic batching (#226)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2023-04-24 17:59:00 +02:00 |
|
OlivierDehaene
|
4b460e72fb
|
fix(server): fix flash batch filtering (#220)
|
2023-04-21 20:26:01 +02:00 |
|
OlivierDehaene
|
1ffea36ec2
|
fix(server): fix flash causal (#219)
|
2023-04-21 19:49:08 +02:00 |
|
OlivierDehaene
|
86bca365df
|
fix(server): fix flash causal (#218)
|
2023-04-21 19:42:16 +02:00 |
|
OlivierDehaene
|
afc5b999d0
|
fix(server): cleanup new flash past_key_values logic (#217)
|
2023-04-21 16:19:04 +02:00 |
|
OlivierDehaene
|
db4cb5e4ed
|
fix(server): fix past key values logic (#216)
@njhill fyi
|
2023-04-21 15:59:18 +02:00 |
|
OlivierDehaene
|
343437c7b5
|
feat(router): add device and dtype info (#215)
|
2023-04-21 15:36:29 +02:00 |
|
Nick Hill
|
ac8c0f6fe4
|
feat(server): flash attention past key value optimizations (#213)
|
2023-04-21 14:57:18 +02:00 |
|
OlivierDehaene
|
709d8936f6
|
feat(router): drop requests when client closes the channel (#202)
|
2023-04-20 11:07:40 +02:00 |
|
OlivierDehaene
|
e14ae3b5e9
|
feat(server): support quantization for flash models (#200)
closes #197
|
2023-04-19 12:51:11 +02:00 |
|
OlivierDehaene
|
5fa8ae041c
|
feat(server): optimize decode for sane tokenizers (#170)
|
2023-04-12 12:03:10 +02:00 |
|
OlivierDehaene
|
299217c95c
|
feat(server): add flash attention llama (#144)
|
2023-04-11 16:38:22 +02:00 |
|
OlivierDehaene
|
9987960062
|
feat(router): make router input validation optional (#164)
|
2023-04-09 20:22:27 +02:00 |
|
OlivierDehaene
|
c0aeb32583
|
feat(server): flash santacoder (#153)
|
2023-04-03 19:06:42 +02:00 |
|