Commit Graph

16 Commits

Author SHA1 Message Date
Ubuntu
a86e4bf713 Working version. 2023-05-11 12:05:35 +00:00
Ubuntu
57a6cbff82 Tmp work for sharding to work properly. 2023-05-11 12:05:35 +00:00
Ubuntu
c5846ee73a Dump. 2023-05-11 12:05:35 +00:00
Ubuntu
c126ca01d9 Non local file. 2023-05-11 12:05:35 +00:00
Ubuntu
c3d12ae2d4 Some protection against sharding (illegal access becuase of g_idx) 2023-05-11 12:05:35 +00:00
Ubuntu
2c9e1171bc [WIP] Adding GPTQ support for llama 2023-05-11 12:05:35 +00:00
OlivierDehaene
68e9d6ab33
feat(server): shard token decode (#303) 2023-05-10 15:48:21 +02:00
OlivierDehaene
ad66f6ef9a
feat(server): optim flash causal lm decode_token (#285) 2023-05-09 18:26:19 +02:00
OlivierDehaene
afc5b999d0
fix(server): cleanup new flash past_key_values logic (#217) 2023-04-21 16:19:04 +02:00
OlivierDehaene
db4cb5e4ed
fix(server): fix past key values logic (#216)
@njhill fyi
2023-04-21 15:59:18 +02:00
OlivierDehaene
e14ae3b5e9
feat(server): support quantization for flash models (#200)
closes #197
2023-04-19 12:51:11 +02:00
OlivierDehaene
880a76eed5
feat(server): support sharded santacoder (#167) 2023-04-12 17:18:08 +02:00
OlivierDehaene
299217c95c
feat(server): add flash attention llama (#144) 2023-04-11 16:38:22 +02:00
OlivierDehaene
9987960062
feat(router): make router input validation optional (#164) 2023-04-09 20:22:27 +02:00
OlivierDehaene
3f2542bb6a
fix(server): fix escape characters in stop sequence (#155) 2023-04-05 19:37:41 +02:00
OlivierDehaene
c0aeb32583
feat(server): flash santacoder (#153) 2023-04-03 19:06:42 +02:00