Commit Graph

100 Commits

Author SHA1 Message Date
OlivierDehaene
83b84486ad
feat(launcher): parse oom signal (#404) 2023-06-02 14:17:27 +02:00
OlivierDehaene
95d3546976
feat(server): load santacoder/starcoder models with safetensors (#393)
Fix #366
2023-06-01 12:10:35 +02:00
OlivierDehaene
49a6c8c1b2 fix(launcher): parse num cuda devices from CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES 2023-05-30 13:27:48 +02:00
OlivierDehaene
146e72c3be fix(launcher): parse num cuda devices from CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES 2023-05-30 12:52:18 +02:00
OlivierDehaene
e3e487dc71
feat(server): support trust_remote_code (#363) 2023-05-23 20:40:39 +02:00
Nicolas Patry
76a48cd365
feat(server): GPTQ quantization (step1) (#277)
Changes only the type from `bool` to `Option<Enum>` pretty much
everywhere.
- Use `Optional[str]` in Python (easier to manage than importing type
everywhere). Except for the cli to get proper validation
- Updated all models to handle gracefully new values. (Error out if
unknown value, or gptq since not implemented).

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-05-12 14:46:41 +02:00
Nicolas Patry
e68509add7
feat(launcher): Improve error message when download process fails. (#276) 2023-05-04 15:29:29 +02:00
OlivierDehaene
b67908e0cf
fix(launcher): pass weights cache override to the download process (#274)
closes #273
2023-05-03 23:39:35 +02:00
OlivierDehaene
85aa7e2e7b
feat(server): support hf endpoint weight layout (#266) 2023-05-03 11:36:24 +02:00
Nicolas Patry
411b0d4e1f
chore(github): add templates (#264) 2023-05-02 15:43:19 +02:00
Nicolas Patry
b0b97fd9a7
doc(launcher): add more docs to the launcher itself and link in the README (#257) 2023-04-29 11:53:42 +02:00
Nicolas Patry
db2b4e0754
feat(router): new healthcheck that skips the queue (#244)
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2023-04-26 20:23:54 +02:00
Nicolas Patry
77758f603b
chore(launcher): refactor logic (#242)
Hopefully it's cleaner
2023-04-26 14:43:36 +02:00
OlivierDehaene
ebc74d5666
feat(router): use number of tokens in batch as input for dynamic batching (#226)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-04-24 17:59:00 +02:00
OlivierDehaene
252f42c1e6
fix(router): add auth token to get model info (#207) 2023-04-19 20:06:06 +02:00
OlivierDehaene
2475aede61
feat(router): add info route (#196)
close #125
2023-04-18 16:16:06 +02:00
OlivierDehaene
7a1ba58557
fix(docker): fix docker image dependencies (#187) 2023-04-17 00:26:47 +02:00
OlivierDehaene
e3a63b6fbc
fix(launcher): revert change on shard errors (#173) 2023-04-13 11:07:11 +02:00
OlivierDehaene
f26dfd0dc1
feat(server): support OPT models (#55)
OPT models do not all have a `tokenizer.json` file on the hub at the
moment. Can't merge for now.
2023-04-11 19:16:41 +02:00
OlivierDehaene
e63a21eb4d
feat(launcher): allow disabling hf_transfer (#161) 2023-04-09 20:00:05 +02:00
OlivierDehaene
55bd4fed7d
feat(router): add best_of parameter (#117) 2023-03-09 15:30:54 +01:00
OlivierDehaene
5fd2dcb513
feat(launcher): default num_shard to CUDA_VISIBLE_DEVICES if possible (#108) 2023-03-08 13:53:41 +01:00
OlivierDehaene
0ac38d336a
feat(launcher): allow parsing num_shard from CUDA_VISIBLE_DEVICES (#107) 2023-03-08 11:06:59 +01:00
OlivierDehaene
cd5961b5da
feat: allow local models (#101)
closes #99
2023-03-06 14:39:36 +01:00
OlivierDehaene
240c4187fd
fix(launcher): add router parameters to launcher (#95) 2023-03-03 16:01:25 +01:00
OlivierDehaene
9b8ea6a6c7
feat(server): add logits watermark (#90) 2023-03-02 12:30:41 +01:00
OlivierDehaene
17bc841b1b
feat(server): enable hf-transfer (#76) 2023-02-18 14:04:11 +01:00
OlivierDehaene
6796d38c6d
feat(router): add cors allow origin options (#73) 2023-02-17 18:22:00 +01:00
OlivierDehaene
7b3d460d21
fix(launcher): copy current env vars to subprocesses (#70)
closes #69
2023-02-16 11:20:23 +01:00
OlivierDehaene
68455353f5
feat(launcher): add disable_custom_kernels arg (#67) 2023-02-15 16:23:45 +01:00
OlivierDehaene
c5a4a1faf3
feat(server): improve download logging (#66) 2023-02-15 16:11:32 +01:00
OlivierDehaene
0fbc691946
feat: add safetensors conversion (#63) 2023-02-14 13:02:16 +01:00
OlivierDehaene
9af454142a
feat: add distributed tracing (#62) 2023-02-13 13:02:45 +01:00
OlivierDehaene
1ad3250b89
fix(docker): increase shm size (#60) 2023-02-08 17:53:33 +01:00
OlivierDehaene
4acc42a605
fix(server): better handling of inference mode (#57) 2023-02-07 15:38:22 +01:00
OlivierDehaene
20c3c5940c
feat(router): refactor API and add openAPI schemas (#53) 2023-02-03 12:43:37 +01:00
OlivierDehaene
7b870e1e18
feat(router): use background task to manage request queue (#52)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-02-02 14:59:27 +01:00
OlivierDehaene
775115e3a5
feat(server): allow the server to use a local weight cache (#49) 2023-02-01 16:22:10 +01:00
OlivierDehaene
f830706b21
feat(server): Support GPT-Neox (#39) 2023-01-31 18:53:56 +01:00
OlivierDehaene
15511edc01
feat(server): Support SantaCoder (#26) 2023-01-20 12:24:39 +01:00
Nick Hill
e6d3eb5d5d
fix(server): Minor refactorization using new_zeros (#24)
- Fix some type hints, in particular base tokenizer class
- Make use of `tensor.new_zero/empty` methods
- Simplify env var string parsing in launcher
2023-01-17 09:10:22 +01:00
OlivierDehaene
fcc2c5fcbf
feat(launcher): Log server stdout (#19)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-01-05 12:01:23 +01:00
OlivierDehaene
4236e41b0d feat(server): Improved doc 2022-11-07 12:53:56 +01:00
OlivierDehaene
cea6051eff feat(launcher): Pass CUDA_VISIBLE_DEVICES to the shard 2022-11-04 18:31:08 +01:00
OlivierDehaene
b3b7ea0d74 feat: Use json formatter by default in docker image 2022-11-02 17:29:56 +01:00
OlivierDehaene
3cf6368c77 feat(server): Support all AutoModelForCausalLM on a best effort basis 2022-10-28 19:24:00 +02:00
OlivierDehaene
09674e6df9 feat(server): Support bitsandbytes 2022-10-27 14:25:29 +02:00
Nicolas Patry
c8ce9b2515
feat(server): Use safetensors
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
2022-10-22 20:00:15 +02:00
OlivierDehaene
c837893370 feat(router): Add max_waiting_tokens 2022-10-21 16:40:05 +02:00
Olivier Dehaene
f16f2f5ae1 v0.1.0 2022-10-20 19:14:44 +02:00