OlivierDehaene
4f9ac67cfa
Revert "feat: Add token streaming using ServerSideEvents support" ( #40 )
...
Reverts huggingface/text-generation-inference#36
2023-01-31 14:21:51 +01:00
OlivierDehaene
7fbfbb0dc5
feat: Add token streaming using ServerSideEvents support ( #36 )
...
Add token streaming using ServerSideEvents (SSE).
The signature of the SSE events is:
```rust
struct Details {
finish_reason: String,
generated_tokens: u32,
seed: Option<u64>,
}
struct StreamResponse {
token: Token,
generated_text: Option<String>,
details: Option<Details>,
}
struct ErrorResponse {
error: String,
}
```
2023-01-31 11:49:43 +01:00
OlivierDehaene
cd298bc5e5
feat: Support sampling seeding ( #37 )
...
Co-authored-by: Yannic Kilcher <yk@users.noreply.github.com>
2023-01-30 15:36:16 +01:00
OlivierDehaene
1539d3cbbe
feat(router): Remove second lock from batcher hot path ( #27 )
...
@njhill
2023-01-26 16:29:13 +01:00
OlivierDehaene
15511edc01
feat(server): Support SantaCoder ( #26 )
2023-01-20 12:24:39 +01:00
Nick Hill
f7ac394935
fix(router): Obey max batch size ( #23 )
2023-01-17 09:11:21 +01:00
Nick Hill
e6d3eb5d5d
fix(server): Minor refactorization using new_zeros ( #24 )
...
- Fix some type hints, in particular base tokenizer class
- Make use of `tensor.new_zero/empty` methods
- Simplify env var string parsing in launcher
2023-01-17 09:10:22 +01:00
OlivierDehaene
32a253063d
feat: Return logprobs ( #8 )
2022-12-15 17:03:56 +01:00
OlivierDehaene
718096f695
feat: Support stop sequences ( #7 )
2022-12-12 18:25:22 +01:00
OlivierDehaene
a2985036aa
feat(server): Add model tests ( #6 )
2022-12-08 18:49:33 +01:00
Nick Hill
31d76e238d
fix(batching): Avoid theoretical hang in batcher loop ( #5 )
...
- Avoid theoretical hang in batcher loop
- Avoid a couple of clones in the router generate method
- Keep attention mask tensors as integers
- Remove num_heads attribute
Co-authored-by: OlivierDehaene <Olivier.dehaene@gmail.com>
2022-12-05 10:10:59 +01:00
OlivierDehaene
91f5f86280
fix(router): Fix HTTP status codes
2022-11-14 14:34:15 +01:00
OlivierDehaene
c5665f5c8b
feat(server): Support generic AutoModelForCausalLM
2022-11-04 14:22:47 +01:00
OlivierDehaene
3cf6368c77
feat(server): Support all AutoModelForCausalLM on a best effort basis
2022-10-28 19:24:00 +02:00
OlivierDehaene
09674e6df9
feat(server): Support bitsandbytes
2022-10-27 14:25:29 +02:00
OlivierDehaene
beb552127a
feat(client): Simplify sharded logic
2022-10-22 23:40:05 +02:00
OlivierDehaene
c837893370
feat(router): Add max_waiting_tokens
2022-10-21 16:40:05 +02:00
Olivier Dehaene
f16f2f5ae1
v0.1.0
2022-10-20 19:14:44 +02:00
Olivier Dehaene
92c1ecd008
feat: Add arguments to CLI
2022-10-17 18:27:33 +02:00
Olivier Dehaene
5e5d8766a2
feat: Improve error handling
2022-10-17 14:59:00 +02:00
Olivier Dehaene
bf99afe916
feat: Docker image
2022-10-14 15:56:21 +02:00
Olivier Dehaene
4c693e6524
Refactored gRPC interface
...
Added validation logic
2022-10-11 16:50:54 +02:00
Olivier Dehaene
fa9a088467
Add load testing
2022-10-11 10:36:51 +02:00