Commit Graph

507 Commits

Author SHA1 Message Date
regisss
2060bb58bf
Fix trust remote code (#55) 2024-02-19 07:53:24 +01:00
Karol Damaszke
2a7a967de3
Revert prefill optimization and fix accuracy issue in shift operation (#29)
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
Co-authored-by: madamczykhabana <110973826+madamczykhabana@users.noreply.github.com>
Co-authored-by: jkaniecki <153085639+jkaniecki@users.noreply.github.com>
2024-01-23 15:19:07 +01:00
jkaniecki
ac3bc0e95e
Removed kv_cache from HPU graph output (#19) 2024-01-19 15:34:13 +01:00
mrs303
da0f874d49
Prefer prefill instead of decode when max_waiting_tokens==0 (#18) 2024-01-19 15:25:40 +01:00
Karol Damaszke
60f63262db
Prefill optimization by allocating space only for the first token (#17) 2024-01-19 15:18:35 +01:00
Adam Stachowicz
0b96da89aa
Make tokenizer optional (#12) 2024-01-19 15:12:04 +01:00
madamczykhabana
381ec38cad
Batch bucketing improvements (#15) 2024-01-17 10:09:27 +01:00
mrs303
8523f7ef64
Deepspeed terminate (#11) 2024-01-17 09:57:03 +01:00
Krzysztof Laskowski
c459c86f88
High-level server profiler (#13) 2024-01-16 09:57:29 +01:00
madamczykhabana
41c4f4fa41
Debugging utils (#14) 2024-01-15 21:05:27 +01:00
Karol Damaszke
a8c5b69e2c
Set default value of LIMIT_HPU_GRAPH to True (#7) 2024-01-11 14:51:49 +01:00
Harish Subramony
532e4b8d41
Readme updates with review comments (#8) 2024-01-11 10:12:43 +01:00
Harish Subramony
cb8b7610c0
Update README for proper usage of LIMIT_HPU_GRAPH (#3)
* Update README for proper usage of LIMIT_HPU_GRAPH
2024-01-09 14:49:15 -08:00
Karol Damaszke
252ccde104
Control prefill and decode batch size separately (#6) 2024-01-02 18:21:01 +01:00
Karol Damaszke
1be2d9a8ec
Batch size bucketing (#5) 2023-12-22 21:53:01 +01:00
jkaniecki
e3dcd7f2c2
Disable tensor caching in HPU Graph execution (#4) 2023-12-22 13:51:16 +01:00
Karol Damaszke
b1897acfd6
Calculate token budget with padding to max_input_length (#2) 2023-12-11 09:24:27 +01:00
Karol Damaszke
6436ae86a1
Fix for continuous batching (#1) 2023-12-11 09:24:09 +01:00
regisss
e5f124b077 Merge tag 'v1.2.0' into v1.2-release 2023-12-06 18:46:16 +01:00
regisss
c09066aeb1 Merge tag 'v1.1.1' into v1.1-release 2023-12-06 09:50:58 +01:00
regisss
cc744ba426 Add changes from Optimum Habana's TGI folder 2023-12-05 11:12:16 +01:00
OlivierDehaene
ccd5725a0c v1.2.0 2023-11-30 15:18:15 +01:00
Nicolas Patry
624800c4de
Make GPTQ test less flaky (#1295)
# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-11-28 21:22:35 +01:00
Nicolas Patry
ba552e1a82
Let each model resolve their own default dtype. (#1287)
# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-11-28 17:54:26 +01:00
Nicolas Patry
3c71c656c7
make install-flash-attn-v2-cuda should work like make install-flash-attn-v2 used to work. (#1294)
# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-11-28 16:28:40 +01:00
fxmarty
b2b5df0e94
Add RoCm support (#1243)
This PR adds support for AMD Instinct MI210 & MI250 GPUs, with paged
attention and FAv2 support.

Remaining items to discuss, on top of possible others:
* Should we have a
`ghcr.io/huggingface/text-generation-inference:1.1.0+rocm` hosted image,
or is it too early?
* Should we set up a CI on MI210/MI250? I don't have access to the
runners of TGI though.
* Are we comfortable with those changes being directly in TGI, or do we
need a fork?

---------

Co-authored-by: Felix Marty <felix@hf.co>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
Co-authored-by: Your Name <you@example.com>
2023-11-27 14:08:12 +01:00
Nicolas Patry
ed2a3f617e
Exllama v2 (#1211)
# What does this PR do?

See #1165

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->

---------

Co-authored-by: Florian Zimmermeister <flozi00.fz@gmail.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-24-153.ec2.internal>
2023-11-25 22:38:38 +01:00
Nicolas Patry
3c02262f29 Reduce race condition on file system for test 2023-11-23 15:42:48 +00:00
Vince Jankovics
c6bb76703f
Fix IDEFICS dtype (#1214)
# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

This forces the use of `bfloat16` for IDEFICS. The issue is that with
`float16` the 80b model gives garbage output. Let me know if this
solution is not appropriate and I'll adjust accordingly. For the details
see below.

The current behaviour:
```sh
$ curl 127.0.0.1:8080/generate -X POST -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' -H 'Content-Type: application/json'
{"generated_text":""}
```

On closer inspection with:
```python
import requests

headers = { "Content-Type": "application/json"}

query = "What is Deep Learning?"
data = {
    "inputs": query,
    "parameters": {
        "max_new_tokens": 10,
        "return_full_text": True,
        "decoder_input_details": True,
        "do_sample": False,
    },
}

api_url = "http://127.0.0.1:8080"
response = requests.post(api_url + "/generate", headers=headers, json=data).json()

for i in ['prefill', 'tokens']:
    print(f'### {i}')
    print(repr(''.join([t['text'] for t in response['details'][i]])))
```

Prints:
```
### prefill
'<s>WhatisDeepLearning?'
### tokens
'<unk><unk><unk><unk><unk><unk><unk><unk><unk><unk>'
########
```

With the change in this PR it prints:
```
### prefill
'<s>WhatisDeepLearning?'
### tokens
'\n\nDeep Learning is a subset of machine'
```

Note, using the Transformers implementation (with
`IdeficsForVisionText2Text.from_pretrained`) produces the latter
(correct) output as well.
This only happens with the 80b model, the 9b model is not as sensitive
to the dtype (as also mentioned in the code).

The reason for "forcing" this in the IDEFICS init method, is because if
quantization is used, then the dtype cannot be set explicitly. And since
it's left as `None`, it's set to `float16` by default
[here](96a982ad8f/server/text_generation_server/models/__init__.py (L90)).
I.e. there's no other way to manually change the dtype if someone is
using quantization:
```sh
$ docker run .... ghcr.io/huggingface/text-generation-inference:latest --model-id HuggingFaceM4/idefics-80b-instruct --dtype bfloat16 --quantize bitsandbytes-nf4
.....
2023-10-31T12:42:26.710401Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2023-10-31T12:42:30.315734Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

Traceback (most recent call last):

  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())

  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 80, in serve
    raise RuntimeError(

RuntimeError: Only 1 can be set between `dtype` and `quantize`, as they both decide how goes the final model.
 rank=0
Error: ShardCannotStart
2023-10-31T12:42:30.414010Z ERROR text_generation_launcher: Shard 0 failed to start
2023-10-31T12:42:30.414044Z  INFO text_generation_launcher: Shutting down shards
```

## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

@Narsil what do you think?

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2023-11-23 15:00:09 +01:00
OlivierDehaene
35509ff5de
chore: update to torch 2.1.0 (#1182)
Close #1142
2023-11-23 13:38:50 +01:00
Traun Leyden
e12c34bd25
Load PEFT weights from local directory (#1260)
# What does this PR do?

Enables PEFT weights to be loaded from a local directory, as opposed to
a hf hub repository. It is a continuation of the work in PR
https://github.com/huggingface/text-generation-inference/pull/762

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes #1259 


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
Pull Request section? **Yes but I don't know how to run the tests for
this repo, and it doesn't look like this code is covered anyway**
- [x] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
to it if that's the case. **Yes, @Narsil asked for a PR in [this
comment](https://github.com/huggingface/text-generation-inference/pull/762#issuecomment-1728089505)**
- [x] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
**I didn't see any documentation added to the [original
PR](https://github.com/huggingface/text-generation-inference/pull/762),
and am not sure where this belongs. Let me know and I can add some**
- [x] Did you write any new necessary tests? **I didn't see any existing
test coverage for this python module**


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

@Narsil 

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@Narsil

 -->

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2023-11-23 12:56:17 +01:00
Diwank Singh Tomer
91111a0dc2
Fix missing trust_remote_code flag for AutoTokenizer in utils.peft (#1270)
Peft loading function was missing the
`trust_remote_code=trust_remote_code` argument causing the custom
tokenizer code to be not found.


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [x] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

@Narsil
2023-11-23 12:41:05 +01:00
Nicolas Patry
b226e469c9
Update README.md (#1272) 2023-11-21 10:39:18 +01:00
OlivierDehaene
3dbc649b11
fix: do not leak inputs on error (#1228)
Close #1225
2023-11-20 10:33:44 +01:00
OlivierDehaene
8acdc1fae7 hotfix 1.1.1 2023-11-16 18:35:09 +01:00
OlivierDehaene
457e72c386 v1.1.1 2023-11-16 13:56:04 +01:00
Omar Sanseviero
a5def7c222
Fix link in quantization guide (#1246) 2023-11-08 17:34:38 +01:00
Nicolas Patry
7b5c167487
Update README.md (#1242)
# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-11-07 10:24:53 +01:00
Nicolas Patry
b9184093d9
Narsil patch 1 (#1241)
# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-11-07 10:13:09 +01:00
Nicolas Patry
414a911b34
Adding the video -> moving the architecture picture lower (#1239)
# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-11-07 01:01:40 +01:00
OlivierDehaene
96a982ad8f fix: better warmup error 2023-10-25 10:18:58 +02:00
OlivierDehaene
f9910d13e2
feat: remove flume (#1184) 2023-10-23 15:51:12 +02:00
OlivierDehaene
12590fdcce
feat: paged attention v2 (#1183) 2023-10-23 12:29:25 +02:00
Aastha Varma
63fa534612
Fix link to quantization page in preparing_model.md (#1187) 2023-10-23 12:12:21 +02:00
OlivierDehaene
5e28f44a83
#1049 CI (#1178)
See #1049

---------

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
2023-10-20 10:28:45 +02:00
Remy
72b8f88be8
fix: remove useless token (#1179)
This token is not used by your action.
Secret is removed from the repository.
2023-10-19 14:04:44 +02:00
star
648ea06430
fix: EETQLinear with bias in layers.py (#1176) 2023-10-19 12:15:05 +02:00
Mario928
9179605e1e
Fix: Replace view() with reshape() in neox_modeling.py to resolve RuntimeError (#1155) 2023-10-19 11:54:26 +02:00
momonga
7402a355dc
Fix calling cuda() on load_in_8bit (#1153)
This PR addresses an issue where calling `model = model.cuda()` would
throw a ValueError when `quantize` is set to "bitsandbytes".

```
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 147, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 295, in get_model
    return CausalLM(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/causal_lm.py", line 515, in __init__
    model = model.cuda()
  File "/opt/conda/lib/python3.9/site-packages/transformers/modeling_utils.py", line 1998, in cuda
    raise ValueError(
ValueError: Calling `cuda()` is not supported for `4-bit` or `8-bit` quantized models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.
```

Co-authored-by: mmnga <mmnga1mmnga@gmail.com>
2023-10-19 10:42:03 +02:00
Mishig
3af1a11401
Fix link in preparing_model.md (#1140)
Fixes a link in doc
2023-10-13 09:48:35 +02:00