OlivierDehaene
532146338b
feat(router): add max_batch_size ( #1542 )
...
Some hardware require a maximum batch size.
2024-02-09 12:38:41 +01:00
drbh
becd09978c
chore: bump rust version and annotate/fix all clippy warnings ( #1455 )
...
This PR just bumps the latest rust version and makes clippy happy
```bash
cargo clippy --all -- -D warnings
# Finished dev [unoptimized + debuginfo] target(s) in 0.10s
```
2024-01-22 15:22:54 +01:00
OlivierDehaene
50b495f3d8
feat: add more latency metrics in forward ( #1346 )
2023-12-14 15:59:38 +01:00
OlivierDehaene
5e28f44a83
#1049 CI ( #1178 )
...
See #1049
---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
2023-10-20 10:28:45 +02:00
OlivierDehaene
73a4d65d26
feat: add cuda memory fraction ( #659 )
...
Close #673
2023-07-24 11:43:58 +02:00
OlivierDehaene
fe80f5360c
feat(server): auto max_batch_total_tokens for flash att models ( #630 )
2023-07-19 09:31:25 +02:00
OlivierDehaene
e74bd41e0f
feat(server): add paged attention to flash models ( #516 )
...
Closes #478
2023-06-30 19:09:59 +02:00
OlivierDehaene
218c9adaa5
feat: decrease IPC proto size ( #367 )
...
Closes #307 #308
2023-05-24 19:19:57 +02:00
OlivierDehaene
68e9d6ab33
feat(server): shard token decode ( #303 )
2023-05-10 15:48:21 +02:00
Ehsan M. Kermani
f092ba9b22
feat(server): add watermarking tests ( #248 )
2023-04-27 19:16:35 +02:00
Nicolas Patry
db2b4e0754
feat(router): new healthcheck that skips the queue ( #244 )
...
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2023-04-26 20:23:54 +02:00
OlivierDehaene
ebc74d5666
feat(router): use number of tokens in batch as input for dynamic batching ( #226 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-04-24 17:59:00 +02:00
OlivierDehaene
343437c7b5
feat(router): add device and dtype info ( #215 )
2023-04-21 15:36:29 +02:00
OlivierDehaene
5cddc055e6
fix(rust-client): use join_all instead of select_all to hopefully fix nccl issues ( #162 )
2023-04-09 20:07:02 +02:00
OlivierDehaene
f000068944
feat(server): clear cache on error ( #143 )
2023-03-28 11:29:35 +02:00
OlivierDehaene
9af454142a
feat: add distributed tracing ( #62 )
2023-02-13 13:02:45 +01:00
OlivierDehaene
017a2a8c2f
feat: Add token streaming using ServerSideEvents support ( #41 )
2023-01-31 17:04:00 +01:00
OlivierDehaene
4f9ac67cfa
Revert "feat: Add token streaming using ServerSideEvents support" ( #40 )
...
Reverts huggingface/text-generation-inference#36
2023-01-31 14:21:51 +01:00
OlivierDehaene
7fbfbb0dc5
feat: Add token streaming using ServerSideEvents support ( #36 )
...
Add token streaming using ServerSideEvents (SSE).
The signature of the SSE events is:
```rust
struct Details {
finish_reason: String,
generated_tokens: u32,
seed: Option<u64>,
}
struct StreamResponse {
token: Token,
generated_text: Option<String>,
details: Option<Details>,
}
struct ErrorResponse {
error: String,
}
```
2023-01-31 11:49:43 +01:00
OlivierDehaene
09674e6df9
feat(server): Support bitsandbytes
2022-10-27 14:25:29 +02:00
OlivierDehaene
beb552127a
feat(client): Simplify sharded logic
2022-10-22 23:40:05 +02:00
Olivier Dehaene
f16f2f5ae1
v0.1.0
2022-10-20 19:14:44 +02:00
Olivier Dehaene
5e5d8766a2
feat: Improve error handling
2022-10-17 14:59:00 +02:00
Olivier Dehaene
4c693e6524
Refactored gRPC interface
...
Added validation logic
2022-10-11 16:50:54 +02:00
Olivier Dehaene
295831a481
Init
2022-10-08 12:30:12 +02:00