This fixes flash attention v1 which was always
NotImplementedError("window_size_left is only available with flash attn
v2").
Currently flash_llama_modeling.py doesn't override the default value of
window_size_left when calling attention(..) (line 282). This means that
window_size_left will always be the default of -1, but flash attention
v1 throws an exception if `window_size_left != 0`.
To fix this, we should be checking `window_size_left != -1` before
throwing the NotImplementedError.
Fixes#1084
## Before submitting
- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?
## Who can review?
Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.
@OlivierDehaene OR @Narsil