Commit Graph

346 Commits

Author SHA1 Message Date
OlivierDehaene
5f67923cac
feat: add nightly load testing (#358) 2023-05-23 17:42:19 +02:00
oOraph
0a6494785c
fix(ci): fix security group (#359)
# What does this PR do?
Switch security group used for ci
(open outbound rules)

Signed-off-by: Raphael <oOraph@users.noreply.github.com>
Co-authored-by: Raphael <oOraph@users.noreply.github.com>
2023-05-23 16:49:11 +02:00
OlivierDehaene
4f4c9c1665
fix(server): t5 cannot run in f16 (#356)
Fix #349
2023-05-23 12:15:54 +02:00
OlivierDehaene
91d9beec90
fix(server): fix init for flash causal lm (#352)
Fixes #347
2023-05-22 15:05:32 +02:00
OlivierDehaene
e649bf9a55
feat(server): Support BLOOMChat-176B (#348) (#351)
@njhill, 
temporary workaround to be able to run our CI as secrets are not
available to runners run by external contributors. I will ask around to
see if there is a better way.

Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-05-22 13:36:00 +02:00
Xin Yang
1ba78207e6
fix: set MODEL_ID in sagemaker-entrypoint script (#343) 2023-05-22 10:10:21 +02:00
OlivierDehaene
5a58226130
fix(server): fix decode token (#334)
Fixes #333

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2023-05-16 23:23:27 +02:00
OlivierDehaene
dbdc587ddd
feat(integration-tests): improve comparison and health checks (#336) 2023-05-16 20:22:11 +02:00
OlivierDehaene
e71471bec9
feat: add snapshot testing (#282) 2023-05-15 23:36:30 +02:00
Nicolas Patry
f58f0a0364
Single place for TP layers + Dropout Layer Norm + FastLinear (#329)
# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-05-15 17:30:47 +02:00
OlivierDehaene
66b277321d
feat(ci): custom gpu runners (#328) 2023-05-15 15:53:08 +02:00
Nicolas Patry
d7a97aa0b6
Removing dead variables. (#327)
# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-05-15 15:14:17 +02:00
Nicolas Patry
91e674bb85
Lifting check_unitialized. (#325)
# What does this PR do?

Lifting check_unitialized.

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-05-15 11:32:25 +02:00
Nicolas Patry
73d84c6ee5
Hotfixes for santacoder/bigcode. (#294)
# What does this PR do?

Hotfixes:

- Uses `model_type`=`gpt_bigcode` for more general usage.
- Hotfixes linked lm_head vs wte_embedding (safetensors file do not
contain the key, correctly when the file is sharded, where as pytorch
copies the tensor)


<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-41-161.ec2.internal>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2023-05-15 10:35:20 +02:00
OlivierDehaene
22c4fd07ab fix(docker): use ubuntu20.04 2023-05-12 18:38:59 +02:00
OlivierDehaene
119f7e0687 fix(docker): remove quantize default 2023-05-12 17:56:32 +02:00
OlivierDehaene
8a8f43410d
chore(docker): use nvidia base image (#318) 2023-05-12 17:32:40 +02:00
Nicolas Patry
76a48cd365
feat(server): GPTQ quantization (step1) (#277)
Changes only the type from `bool` to `Option<Enum>` pretty much
everywhere.
- Use `Optional[str]` in Python (easier to manage than importing type
everywhere). Except for the cli to get proper validation
- Updated all models to handle gracefully new values. (Error out if
unknown value, or gptq since not implemented).

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-05-12 14:46:41 +02:00
OlivierDehaene
4f6d038c0b fix(server): fix multinomial implem in Sampling 2023-05-11 13:30:38 +02:00
OlivierDehaene
a6c18c39bb
feat(server): use cuda graph in logits warping (#302) 2023-05-10 19:08:54 +02:00
OlivierDehaene
35ab6cfcf1 fix(docker): remove CUDA_VERSION 2023-05-10 16:16:06 +02:00
OlivierDehaene
745f596c88
feat(server): use float16 (#304) 2023-05-10 15:51:10 +02:00
OlivierDehaene
68e9d6ab33
feat(server): shard token decode (#303) 2023-05-10 15:48:21 +02:00
OlivierDehaene
1585404464
fix(docker): remove nvidia require cuda env (#310) 2023-05-10 15:29:21 +02:00
OlivierDehaene
49cffad1bc
fix(docker): fix nvidia env vars (#305) 2023-05-09 19:02:52 +02:00
OlivierDehaene
ad66f6ef9a
feat(server): optim flash causal lm decode_token (#285) 2023-05-09 18:26:19 +02:00
OlivierDehaene
bc5c07231e
fix(docker): fix docker build (#299) 2023-05-09 14:39:59 +02:00
OlivierDehaene
e250282213
feat(docker): add benchmarking tool to docker image (#298) 2023-05-09 13:19:31 +02:00
Sai Vinay G
926fd9a010
feat(router): Adding response schema for compat_generate (#292) 2023-05-09 12:38:09 +02:00
OlivierDehaene
e9b01b3433
fix(dockerfile): fix nvidia env vars (#297)
Fixes #291
2023-05-09 12:36:02 +02:00
Nicolas Patry
b4aa87db58
fea(server): decrease convert RAM requirements (#286) 2023-05-05 17:57:02 +02:00
Nicolas Patry
3314a46d36
chore: add flash-attention to docker ignore (#287)
included when building docker locally.
(Where the local dirs might have the flash-attention folder.)


# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-05-05 17:52:09 +02:00
Nicolas Patry
690fc31757
fix(server): fix convert (#284) 2023-05-05 15:28:08 +02:00
Nicolas Patry
e68509add7
feat(launcher): Improve error message when download process fails. (#276) 2023-05-04 15:29:29 +02:00
Nicolas Patry
f08343d44d
fix(server): Removes the parallelism in file convertion (during download) (#275) 2023-05-04 15:22:54 +02:00
Nicolas Patry
b4fe248b17
fix(launcher): handle hub branches (#278) 2023-05-04 15:14:28 +02:00
OlivierDehaene
b67908e0cf
fix(launcher): pass weights cache override to the download process (#274)
closes #273
2023-05-03 23:39:35 +02:00
OlivierDehaene
85aa7e2e7b
feat(server): support hf endpoint weight layout (#266) 2023-05-03 11:36:24 +02:00
OlivierDehaene
4096000e34
fix(server): fix typo in tokenizers decode (#269)
closes #268
2023-05-03 10:10:34 +02:00
Nicolas Patry
411b0d4e1f
chore(github): add templates (#264) 2023-05-02 15:43:19 +02:00
Nicolas Patry
e86cca9723
Adding docs on how dynamic batching works. (#258)
This PR starts the minimal possible amount of explanation I could think
of. It tries to explain how dynamic batching occurs, the interactions
with past key values and ignores the padding problem.

Maybe some drawings could help too but I kept it to text for now.
2023-05-01 14:16:50 +02:00
OlivierDehaene
0e9d249b79
feat(benchmark): add support for private tokenizers (#262) 2023-04-29 12:17:30 +02:00
Nicolas Patry
b0b97fd9a7
doc(launcher): add more docs to the launcher itself and link in the README (#257) 2023-04-29 11:53:42 +02:00
OlivierDehaene
593a563414
feat(docker): add nvidia env vars (#255) 2023-04-27 19:18:33 +02:00
Ehsan M. Kermani
f092ba9b22
feat(server): add watermarking tests (#248) 2023-04-27 19:16:35 +02:00
OlivierDehaene
b9ae7e5da1
chore(server): update transformers (#250) 2023-04-27 09:57:41 +02:00
Nick Hill
34bca0b8d3
fix(server): Small tidy of code from recent changes (#251)
remaining_decode_tokens was calculated twice in Seq2SeqLMBatch.filter()
2023-04-27 09:57:28 +02:00
Nick Hill
b4cf832c40
fix(server): fix reshaping of bloom past_key_values in concatenate() (#252)
Introduced in #214 

Fixes #249
2023-04-27 09:51:27 +02:00
Nicolas Patry
db2b4e0754
feat(router): new healthcheck that skips the queue (#244)
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2023-04-26 20:23:54 +02:00
Nicolas Patry
c4fb09f2ae
feat(router): add tests to validation (#237) 2023-04-26 16:14:40 +02:00