OlivierDehaene
b66b190403
feat(router): ngrok edge ( #642 )
2023-07-19 11:59:58 +02:00
OlivierDehaene
fe80f5360c
feat(server): auto max_batch_total_tokens for flash att models ( #630 )
2023-07-19 09:31:25 +02:00
OlivierDehaene
982ce3227b
feat(router): explicit warning if revision is not set ( #608 )
2023-07-13 18:59:38 +02:00
OlivierDehaene
b7327205a6
feat(launcher): add arg validation and drop subprocess ( #595 )
2023-07-13 14:22:37 +02:00
OlivierDehaene
b4024edd45
feat: better errors for warmup and TP ( #575 )
...
Close #571
2023-07-10 14:47:15 +02:00
OlivierDehaene
6f42942772
feat(router): add argument for hostname in router ( #545 ) ( #550 )
...
# What does this PR do?
In title. Adds argument `--hostname` in router to support something like
`--hostname ::`. Tested with
```commandline
cargo run -- --port 8080 --hostname ::
curl -I -X GET 'http://[::1]:8080/health ' # failed before this commit
```
Trigger CI
---------
Co-authored-by: Phil Chen <philchen2000@gmail.com>
2023-07-05 18:28:45 +02:00
OlivierDehaene
3b0c979efc
feat(router): arg validation ( #519 )
2023-06-30 20:07:49 +02:00
OlivierDehaene
e74bd41e0f
feat(server): add paged attention to flash models ( #516 )
...
Closes #478
2023-06-30 19:09:59 +02:00
OlivierDehaene
f59fb8b630
feat(router): add ngrok integration ( #453 )
2023-06-16 16:25:11 +02:00
OlivierDehaene
5a58226130
fix(server): fix decode token ( #334 )
...
Fixes #333
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2023-05-16 23:23:27 +02:00
OlivierDehaene
e250282213
feat(docker): add benchmarking tool to docker image ( #298 )
2023-05-09 13:19:31 +02:00
Nicolas Patry
b4fe248b17
fix(launcher): handle hub branches ( #278 )
2023-05-04 15:14:28 +02:00
OlivierDehaene
ebc74d5666
feat(router): use number of tokens in batch as input for dynamic batching ( #226 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-04-24 17:59:00 +02:00
OlivierDehaene
343437c7b5
feat(router): add device and dtype info ( #215 )
2023-04-21 15:36:29 +02:00
OlivierDehaene
252f42c1e6
fix(router): add auth token to get model info ( #207 )
2023-04-19 20:06:06 +02:00
OlivierDehaene
2475aede61
feat(router): add info route ( #196 )
...
close #125
2023-04-18 16:16:06 +02:00
OlivierDehaene
9987960062
feat(router): make router input validation optional ( #164 )
2023-04-09 20:22:27 +02:00
OlivierDehaene
fef1a1c381
v0.4.3 ( #152 )
2023-03-30 17:28:14 +02:00
OlivierDehaene
610bb1f978
feat(benchmark): tui based benchmarking tool ( #149 )
2023-03-30 15:26:27 +02:00
OlivierDehaene
f000068944
feat(server): clear cache on error ( #143 )
2023-03-28 11:29:35 +02:00
OlivierDehaene
55bd4fed7d
feat(router): add best_of parameter ( #117 )
2023-03-09 15:30:54 +01:00
OlivierDehaene
cd5961b5da
feat: allow local models ( #101 )
...
closes #99
2023-03-06 14:39:36 +01:00
OlivierDehaene
4e685d907e
feat(router): ask hf.co for pipelinetag to decide on compat_return_full_text ( #89 )
2023-02-28 10:19:32 +01:00
OlivierDehaene
6796d38c6d
feat(router): add cors allow origin options ( #73 )
2023-02-17 18:22:00 +01:00
OlivierDehaene
5437d49beb
feat(router): add max_total_tokens and empty_input validation ( #68 )
...
closes #65
2023-02-15 21:56:59 +01:00
OlivierDehaene
9af454142a
feat: add distributed tracing ( #62 )
2023-02-13 13:02:45 +01:00
OlivierDehaene
b3b7ea0d74
feat: Use json formatter by default in docker image
2022-11-02 17:29:56 +01:00
OlivierDehaene
3cf6368c77
feat(server): Support all AutoModelForCausalLM on a best effort basis
2022-10-28 19:24:00 +02:00
OlivierDehaene
beb552127a
feat(client): Simplify sharded logic
2022-10-22 23:40:05 +02:00
OlivierDehaene
c837893370
feat(router): Add max_waiting_tokens
2022-10-21 16:40:05 +02:00
Olivier Dehaene
f16f2f5ae1
v0.1.0
2022-10-20 19:14:44 +02:00
Olivier Dehaene
92c1ecd008
feat: Add arguments to CLI
2022-10-17 18:27:33 +02:00
Olivier Dehaene
5e5d8766a2
feat: Improve error handling
2022-10-17 14:59:00 +02:00
Olivier Dehaene
bf99afe916
feat: Docker image
2022-10-14 15:56:21 +02:00
Olivier Dehaene
39df4d9975
Use axum
2022-10-11 18:14:39 +02:00
Olivier Dehaene
4c693e6524
Refactored gRPC interface
...
Added validation logic
2022-10-11 16:50:54 +02:00
Olivier Dehaene
fa9a088467
Add load testing
2022-10-11 10:36:51 +02:00
Olivier Dehaene
295831a481
Init
2022-10-08 12:30:12 +02:00