OlivierDehaene
1a2d68250a
feat: support typical sampling ( #114 )
...
closes #112
2023-03-09 11:33:57 +01:00
OlivierDehaene
3fef90d50f
feat(clients): Python client ( #103 )
2023-03-07 18:52:22 +01:00
OlivierDehaene
9b8ea6a6c7
feat(server): add logits watermark ( #90 )
2023-03-02 12:30:41 +01:00
OlivierDehaene
439fcaf810
feat(router): add prometheus metrics scrape endpoint ( #71 )
2023-02-16 17:18:53 +01:00
OlivierDehaene
5437d49beb
feat(router): add max_total_tokens and empty_input validation ( #68 )
...
closes #65
2023-02-15 21:56:59 +01:00
OlivierDehaene
9af454142a
feat: add distributed tracing ( #62 )
2023-02-13 13:02:45 +01:00
OlivierDehaene
20c3c5940c
feat(router): refactor API and add openAPI schemas ( #53 )
2023-02-03 12:43:37 +01:00
OlivierDehaene
7b870e1e18
feat(router): use background task to manage request queue ( #52 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-02-02 14:59:27 +01:00
OlivierDehaene
313194f6d7
feat(server): support repetition penalty ( #47 )
2023-02-01 15:58:42 +01:00
OlivierDehaene
017a2a8c2f
feat: Add token streaming using ServerSideEvents support ( #41 )
2023-01-31 17:04:00 +01:00
OlivierDehaene
54fec93193
fix(server): fix seeding with multiple shards ( #44 )
2023-01-31 16:01:15 +01:00
OlivierDehaene
4f9ac67cfa
Revert "feat: Add token streaming using ServerSideEvents support" ( #40 )
...
Reverts huggingface/text-generation-inference#36
2023-01-31 14:21:51 +01:00
OlivierDehaene
7fbfbb0dc5
feat: Add token streaming using ServerSideEvents support ( #36 )
...
Add token streaming using ServerSideEvents (SSE).
The signature of the SSE events is:
```rust
struct Details {
finish_reason: String,
generated_tokens: u32,
seed: Option<u64>,
}
struct StreamResponse {
token: Token,
generated_text: Option<String>,
details: Option<Details>,
}
struct ErrorResponse {
error: String,
}
```
2023-01-31 11:49:43 +01:00
OlivierDehaene
15511edc01
feat(server): Support SantaCoder ( #26 )
2023-01-20 12:24:39 +01:00
Nick Hill
60472f9d2b
feat(router): Add const parameters to validation logic ( #15 )
...
I noticed some opportunity to collapse some of the logic, in case you
are interested.
2023-01-03 10:41:22 +01:00
Nick Hill
3efa5bbbfd
fix(router): Include special tokens when tokenizing ( #14 )
...
There's currently a discrepancy in the tokenization between the router
and python server code. The latter includes special tokens but former
does not.
This results in a token count mismatch for seq2seq models such as mt0
where the tokenizer emits an EOS token at the end.
This in turn results in some unexpected/incorrect output, in particular
when batch concatenation is involved, because the python code uses the
input length passed from the router for each row.
As far as I can tell, it is better to include this token in the encoder
`input_ids`, so I guess it's best to just adjust on the router side.
2022-12-30 19:31:44 +01:00
OlivierDehaene
718096f695
feat: Support stop sequences ( #7 )
2022-12-12 18:25:22 +01:00
Nick Hill
31d76e238d
fix(batching): Avoid theoretical hang in batcher loop ( #5 )
...
- Avoid theoretical hang in batcher loop
- Avoid a couple of clones in the router generate method
- Keep attention mask tensors as integers
- Remove num_heads attribute
Co-authored-by: OlivierDehaene <Olivier.dehaene@gmail.com>
2022-12-05 10:10:59 +01:00
OlivierDehaene
d6d5b12e03
fix(router): Handle tokenizer errors
2022-11-14 17:15:19 +01:00
OlivierDehaene
91f5f86280
fix(router): Fix HTTP status codes
2022-11-14 14:34:15 +01:00
OlivierDehaene
09674e6df9
feat(server): Support bitsandbytes
2022-10-27 14:25:29 +02:00
OlivierDehaene
c837893370
feat(router): Add max_waiting_tokens
2022-10-21 16:40:05 +02:00
OlivierDehaene
895a341d06
fix(validation): Fix error messages
2022-10-21 10:59:15 +02:00
Olivier Dehaene
f16f2f5ae1
v0.1.0
2022-10-20 19:14:44 +02:00
Olivier Dehaene
92c1ecd008
feat: Add arguments to CLI
2022-10-17 18:27:33 +02:00
Olivier Dehaene
5e5d8766a2
feat: Improve error handling
2022-10-17 14:59:00 +02:00
Olivier Dehaene
4c693e6524
Refactored gRPC interface
...
Added validation logic
2022-10-11 16:50:54 +02:00