Ubuntu
|
2c9e1171bc
|
[WIP] Adding GPTQ support for llama
|
2023-05-11 12:05:35 +00:00 |
|
OlivierDehaene
|
68e9d6ab33
|
feat(server): shard token decode (#303)
|
2023-05-10 15:48:21 +02:00 |
|
OlivierDehaene
|
ad66f6ef9a
|
feat(server): optim flash causal lm decode_token (#285)
|
2023-05-09 18:26:19 +02:00 |
|
OlivierDehaene
|
85aa7e2e7b
|
feat(server): support hf endpoint weight layout (#266)
|
2023-05-03 11:36:24 +02:00 |
|
OlivierDehaene
|
db4cb5e4ed
|
fix(server): fix past key values logic (#216)
@njhill fyi
|
2023-04-21 15:59:18 +02:00 |
|
OlivierDehaene
|
343437c7b5
|
feat(router): add device and dtype info (#215)
|
2023-04-21 15:36:29 +02:00 |
|
OlivierDehaene
|
e14ae3b5e9
|
feat(server): support quantization for flash models (#200)
closes #197
|
2023-04-19 12:51:11 +02:00 |
|
OlivierDehaene
|
299217c95c
|
feat(server): add flash attention llama (#144)
|
2023-04-11 16:38:22 +02:00 |
|