Commit Graph

89 Commits

Author SHA1 Message Date
OlivierDehaene
3b0c979efc
feat(router): arg validation (#519) 2023-06-30 20:07:49 +02:00
OlivierDehaene
e74bd41e0f
feat(server): add paged attention to flash models (#516)
Closes #478
2023-06-30 19:09:59 +02:00
Robert Kimball
70f485bf9f
feat(router): add header option to disable buffering for the generate_stream response (#498)
# This PR adds an http header option to disable buffering for the
generate_stream endpoint response stream.

Problem: If a model is run behind a proxy server such as nginx that has
buffering enabled then the response stream from generate_stream gets
aggregated into a single response which basically disables streaming.
Instead of getting a chunked response where each token is presented over
time the response presents everything all at once.

Solution: This change adds the `X-Accel-Buffering` http header which
disables buffering for the generate_stream response, allowing the
response to stream properly.
2023-06-28 11:50:12 +02:00
OlivierDehaene
bd3a9d8e85
fix(router): add timeout on flume sends (#488) 2023-06-23 14:58:28 +02:00
OlivierDehaene
f59fb8b630
feat(router): add ngrok integration (#453) 2023-06-16 16:25:11 +02:00
OlivierDehaene
19c41824cb chore: update openapi schema 2023-06-05 18:16:08 +02:00
OlivierDehaene
895c5f1562
feat(server): only compute prefill logprobs when asked (#406)
Close #288
2023-06-02 17:12:30 +02:00
OlivierDehaene
218c9adaa5
feat: decrease IPC proto size (#367)
Closes #307 #308
2023-05-24 19:19:57 +02:00
OlivierDehaene
942005386a
feat(router): log input/ouput at debug level (#364)
@njhill FYI
2023-05-23 20:47:37 +02:00
OlivierDehaene
5a58226130
fix(server): fix decode token (#334)
Fixes #333

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2023-05-16 23:23:27 +02:00
OlivierDehaene
e250282213
feat(docker): add benchmarking tool to docker image (#298) 2023-05-09 13:19:31 +02:00
Sai Vinay G
926fd9a010
feat(router): Adding response schema for compat_generate (#292) 2023-05-09 12:38:09 +02:00
Nicolas Patry
b4fe248b17
fix(launcher): handle hub branches (#278) 2023-05-04 15:14:28 +02:00
Nicolas Patry
411b0d4e1f
chore(github): add templates (#264) 2023-05-02 15:43:19 +02:00
Ehsan M. Kermani
f092ba9b22
feat(server): add watermarking tests (#248) 2023-04-27 19:16:35 +02:00
Nicolas Patry
db2b4e0754
feat(router): new healthcheck that skips the queue (#244)
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2023-04-26 20:23:54 +02:00
Nicolas Patry
c4fb09f2ae
feat(router): add tests to validation (#237) 2023-04-26 16:14:40 +02:00
Nicolas Patry
45344244cf
Starting some routing tests. (#233) 2023-04-25 14:13:14 +02:00
OlivierDehaene
8b182eb986
feat(router): add endpoint info to /info route (#228) 2023-04-25 13:11:18 +02:00
OlivierDehaene
ebc74d5666
feat(router): use number of tokens in batch as input for dynamic batching (#226)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-04-24 17:59:00 +02:00
OlivierDehaene
343437c7b5
feat(router): add device and dtype info (#215) 2023-04-21 15:36:29 +02:00
OlivierDehaene
709d8936f6
feat(router): drop requests when client closes the channel (#202) 2023-04-20 11:07:40 +02:00
OlivierDehaene
252f42c1e6
fix(router): add auth token to get model info (#207) 2023-04-19 20:06:06 +02:00
OlivierDehaene
2475aede61
feat(router): add info route (#196)
close #125
2023-04-18 16:16:06 +02:00
OlivierDehaene
c13b9d87c9
fix(router): fix truncation (#190)
closes #189
2023-04-17 16:51:53 +02:00
OlivierDehaene
9987960062
feat(router): make router input validation optional (#164) 2023-04-09 20:22:27 +02:00
OlivierDehaene
7dec65a244
fix(router): use buckets for metrics histograms (#163) 2023-04-09 20:13:28 +02:00
OlivierDehaene
fef1a1c381
v0.4.3 (#152) 2023-03-30 17:28:14 +02:00
OlivierDehaene
610bb1f978
feat(benchmark): tui based benchmarking tool (#149) 2023-03-30 15:26:27 +02:00
OlivierDehaene
d503e8f09d
feat: aws sagemaker compatible image (#147)
The only difference is that now it pushes to
registry.internal.huggingface.tech/api-inference/community/text-generation-inference/sagemaker:...
instead of
registry.internal.huggingface.tech/api-inference/community/text-generation-inference:sagemaker-...

---------

Co-authored-by: Philipp Schmid <32632186+philschmid@users.noreply.github.com>
2023-03-29 21:38:30 +02:00
OlivierDehaene
f000068944
feat(server): clear cache on error (#143) 2023-03-28 11:29:35 +02:00
OlivierDehaene
b49dbf2d88
fix(server): use server tokenizer as gt (#128) 2023-03-16 12:12:26 +01:00
OlivierDehaene
cbd36aa4d1
fix(server): revert gpt-neox optims (#123) 2023-03-13 22:57:08 +01:00
OlivierDehaene
55bd4fed7d
feat(router): add best_of parameter (#117) 2023-03-09 15:30:54 +01:00
OlivierDehaene
e8bfe199ba
feat(router): support left truncation (#115)
closes #111
2023-03-09 13:10:30 +01:00
OlivierDehaene
1a2d68250a
feat: support typical sampling (#114)
closes #112
2023-03-09 11:33:57 +01:00
OlivierDehaene
3fef90d50f
feat(clients): Python client (#103) 2023-03-07 18:52:22 +01:00
OlivierDehaene
cd5961b5da
feat: allow local models (#101)
closes #99
2023-03-06 14:39:36 +01:00
OlivierDehaene
9b8ea6a6c7
feat(server): add logits watermark (#90) 2023-03-02 12:30:41 +01:00
OlivierDehaene
f874c47831
feat(router): add api-inference headers (#91) 2023-03-02 11:41:51 +01:00
OlivierDehaene
4e685d907e
feat(router): ask hf.co for pipelinetag to decide on compat_return_full_text (#89) 2023-02-28 10:19:32 +01:00
OlivierDehaene
21340f24ba
feat(router): add legacy route for api-inference support (#88) 2023-02-27 14:56:58 +01:00
OlivierDehaene
0ac184ce77
feat(server): add special token bool (#85) 2023-02-24 15:55:57 +01:00
OlivierDehaene
6796d38c6d
feat(router): add cors allow origin options (#73) 2023-02-17 18:22:00 +01:00
OlivierDehaene
439fcaf810
feat(router): add prometheus metrics scrape endpoint (#71) 2023-02-16 17:18:53 +01:00
OlivierDehaene
5437d49beb
feat(router): add max_total_tokens and empty_input validation (#68)
closes #65
2023-02-15 21:56:59 +01:00
OlivierDehaene
9af454142a
feat: add distributed tracing (#62) 2023-02-13 13:02:45 +01:00
Yannic Kilcher
e520d5b349
fixed SSE naming (#61)
https://en.wikipedia.org/wiki/Server-sent_events
2023-02-08 22:30:11 +01:00
OlivierDehaene
20c3c5940c
feat(router): refactor API and add openAPI schemas (#53) 2023-02-03 12:43:37 +01:00
OlivierDehaene
b1482d9048
breaking(router): modify /generate API to only return generated text (#50)
@njhill, @yk FYI

generated_text was concatenated to the user prompt for legacy reason. We
want to remove this behaviour as we don't think it is useful and even
detrimonial to usability.

We also remove the unused Vec.
2023-02-02 15:02:04 +01:00