Commit Graph

132 Commits

Author SHA1 Message Date
OlivierDehaene
21340f24ba
feat(router): add legacy route for api-inference support (#88) 2023-02-27 14:56:58 +01:00
OlivierDehaene
0ac184ce77
feat(server): add special token bool (#85) 2023-02-24 15:55:57 +01:00
OlivierDehaene
6796d38c6d
feat(router): add cors allow origin options (#73) 2023-02-17 18:22:00 +01:00
OlivierDehaene
439fcaf810
feat(router): add prometheus metrics scrape endpoint (#71) 2023-02-16 17:18:53 +01:00
OlivierDehaene
5437d49beb
feat(router): add max_total_tokens and empty_input validation (#68)
closes #65
2023-02-15 21:56:59 +01:00
OlivierDehaene
9af454142a
feat: add distributed tracing (#62) 2023-02-13 13:02:45 +01:00
Yannic Kilcher
e520d5b349
fixed SSE naming (#61)
https://en.wikipedia.org/wiki/Server-sent_events
2023-02-08 22:30:11 +01:00
OlivierDehaene
20c3c5940c
feat(router): refactor API and add openAPI schemas (#53) 2023-02-03 12:43:37 +01:00
OlivierDehaene
b1482d9048
breaking(router): modify /generate API to only return generated text (#50)
@njhill, @yk FYI

generated_text was concatenated to the user prompt for legacy reason. We
want to remove this behaviour as we don't think it is useful and even
detrimonial to usability.

We also remove the unused Vec.
2023-02-02 15:02:04 +01:00
OlivierDehaene
313194f6d7
feat(server): support repetition penalty (#47) 2023-02-01 15:58:42 +01:00
OlivierDehaene
017a2a8c2f
feat: Add token streaming using ServerSideEvents support (#41) 2023-01-31 17:04:00 +01:00
OlivierDehaene
4f9ac67cfa
Revert "feat: Add token streaming using ServerSideEvents support" (#40)
Reverts huggingface/text-generation-inference#36
2023-01-31 14:21:51 +01:00
OlivierDehaene
7fbfbb0dc5
feat: Add token streaming using ServerSideEvents support (#36)
Add token streaming using ServerSideEvents (SSE).

The signature of the SSE events is: 

```rust
struct Details {
    finish_reason: String,
    generated_tokens: u32,
    seed: Option<u64>,
}

struct StreamResponse {
    token: Token,
    generated_text: Option<String>,
    details: Option<Details>,
}

struct ErrorResponse {
    error: String,
}
```
2023-01-31 11:49:43 +01:00
OlivierDehaene
cd298bc5e5
feat: Support sampling seeding (#37)
Co-authored-by: Yannic Kilcher <yk@users.noreply.github.com>
2023-01-30 15:36:16 +01:00
OlivierDehaene
5c01e2544c
fix(router): fix api-inference deployment (#31) 2023-01-23 17:42:14 +01:00
OlivierDehaene
f9d0ec376a
feat(docker): Make the image compatible with api-inference (#29) 2023-01-23 17:11:27 +01:00
OlivierDehaene
32a253063d
feat: Return logprobs (#8) 2022-12-15 17:03:56 +01:00
OlivierDehaene
718096f695
feat: Support stop sequences (#7) 2022-12-12 18:25:22 +01:00
Nick Hill
31d76e238d
fix(batching): Avoid theoretical hang in batcher loop (#5)
- Avoid theoretical hang in batcher loop
- Avoid a couple of clones in the router generate method
- Keep attention mask tensors as integers
- Remove num_heads attribute

Co-authored-by: OlivierDehaene <Olivier.dehaene@gmail.com>
2022-12-05 10:10:59 +01:00
OlivierDehaene
c5665f5c8b feat(server): Support generic AutoModelForCausalLM 2022-11-04 14:22:47 +01:00
OlivierDehaene
3cf6368c77 feat(server): Support all AutoModelForCausalLM on a best effort basis 2022-10-28 19:24:00 +02:00
OlivierDehaene
09674e6df9 feat(server): Support bitsandbytes 2022-10-27 14:25:29 +02:00
OlivierDehaene
c837893370 feat(router): Add max_waiting_tokens 2022-10-21 16:40:05 +02:00
Olivier Dehaene
f16f2f5ae1 v0.1.0 2022-10-20 19:14:44 +02:00
Olivier Dehaene
92c1ecd008 feat: Add arguments to CLI 2022-10-17 18:27:33 +02:00
Olivier Dehaene
5e5d8766a2 feat: Improve error handling 2022-10-17 14:59:00 +02:00
Olivier Dehaene
bcb53903b8 feat: Add AML deployment 2022-10-15 20:21:50 +02:00
Olivier Dehaene
bf99afe916 feat: Docker image 2022-10-14 15:56:21 +02:00
Olivier Dehaene
39df4d9975 Use axum 2022-10-11 18:14:39 +02:00
Olivier Dehaene
e86ecbac63 ValidationError was not correctly handled 2022-10-11 16:53:40 +02:00
Olivier Dehaene
4c693e6524 Refactored gRPC interface
Added validation logic
2022-10-11 16:50:54 +02:00
Olivier Dehaene
fa9a088467 Add load testing 2022-10-11 10:36:51 +02:00