Commit Graph

108 Commits

Author SHA1 Message Date
regisss
0deebe7012 Update README with Docker image v2.0.5 2024-09-07 17:56:52 +00:00
Thanaji Rao Thakkalapelli
a4f39a1cae
Update README.md with changes related to LLava-next multi card support (#221) 2024-09-07 17:46:21 +02:00
Thanaji Rao Thakkalapelli
fde061ccf8
Updated docker image version to 2.0.4 (#212)
Co-authored-by: Thanaji Thakkalapelli <tthakkalapelli@tthakkalapelli-vm-u22.habana-labs.com>
2024-08-27 10:14:27 +02:00
yuanwu2017
2985503900
llava-next Fp8 (#209)
Signed-off-by: yuanwu <yuan.wu@intel.com>
Co-authored-by: Thanaji Rao Thakkalapelli <tthakkalapelli@habana.ai>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
2024-08-26 16:53:08 +02:00
Thanaji Rao Thakkalapelli
e33db1877c
Updated Readme to use flash attention for llama (#200) 2024-08-26 11:01:11 +02:00
Thanaji Rao Thakkalapelli
0c3239e710
Enable quantization with INC (#203) 2024-08-26 10:55:37 +02:00
yuanwu2017
a8cead1f92
Upgrade SynapseAI version to 1.17.0 (#208)
Signed-off-by: yuanwu <yuan.wu@intel.com>
Co-authored-by: Thanaji Rao Thakkalapelli <tthakkalapelli@habana.ai>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
2024-08-26 10:49:29 +02:00
Vidya Galli
4dc67e4ef3
Version check, doc fixes (#182) 2024-08-07 22:09:51 +02:00
Karol Damaszke
1b4d80c03e
Update docker image path in README (#181) 2024-07-05 10:29:02 +02:00
Jacek Czaja
c64b5b75e2
[TORCH COMPILE] Ignore HPU GRAPHS env var when eager mode is used (#165)
Co-authored-by: Jacek Czaja <jczaja@habana.ai>
2024-07-03 15:17:27 +02:00
Karol Damaszke
ecd1cf180d
Add full commands for supported configs (#150)
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
2024-06-06 11:09:45 +02:00
Jimin Ha
1023de8048
Add flash_attention argument options for Mistral (#145)
Co-authored-by: Karol Damaszke <karol.damaszke@intel.com>
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
2024-05-27 20:00:42 +02:00
regisss
16f9ff8965
Update README with new Docker image (#143) 2024-05-14 15:31:52 +02:00
Karol Damaszke
4169ff8e6f
Add info about FP8 support (#137)
Co-authored-by: jkaniecki <153085639+jkaniecki@users.noreply.github.com>
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
2024-05-06 11:03:14 +02:00
Karol Damaszke
0bbec634f9 Update README example commands 2024-05-06 09:26:01 +03:00
Karol Damaszke
9796b0e10d
Add simple continuous batching benchmark (#108)
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
2024-03-26 09:17:55 +01:00
regisss
7f58680999
Add docker pull command in README (#110) 2024-03-25 15:44:54 +01:00
jkaniecki
8504f9c41c
Improve README clarity (#106) 2024-03-18 15:15:07 +01:00
Karol Damaszke
365f277900
Clean-up README (#96)
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
2024-03-10 22:02:15 +01:00
Karol Damaszke
80ae9ead28
Set MAX_TOTAL_TOKENS automatically (#91)
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
2024-03-01 11:25:15 +01:00
Karol Damaszke
8f6564ce0e
Heap based router queue (#63) (#88)
Co-authored-by: mrs303 <54661797+mrs303@users.noreply.github.com>
2024-02-29 10:56:26 +01:00
Karol Damaszke
2122acc60f
Add warmup for all possible shapes for prefill #49 (#81) 2024-02-28 10:40:13 +01:00
Karol Damaszke
31bed905d4
Update habana profiler (#50) (#80)
Co-authored-by: mswiniarsk <156412439+mswiniarsk@users.noreply.github.com>
2024-02-28 09:57:40 +01:00
jkaniecki
a490847702
Sequence bucketing for prefill (#39) (#67)
Co-authored-by: mswiniarsk <156412439+mswiniarsk@users.noreply.github.com>
2024-02-23 01:52:14 +01:00
jkaniecki
9ad6086250
Improve habana profile dev experience (#36) (#65)
Co-authored-by: Michal Szutenberg <37601244+szutenberg@users.noreply.github.com>
2024-02-22 13:57:45 +01:00
jkaniecki
80303b469c
Do not limit hpu graphs by default (#32) (#61)
Co-authored-by: mswiniarsk <156412439+mswiniarsk@users.noreply.github.com>
2024-02-21 15:38:00 +01:00
Adam Stachowicz
0b96da89aa
Make tokenizer optional (#12) 2024-01-19 15:12:04 +01:00
Krzysztof Laskowski
c459c86f88
High-level server profiler (#13) 2024-01-16 09:57:29 +01:00
Karol Damaszke
a8c5b69e2c
Set default value of LIMIT_HPU_GRAPH to True (#7) 2024-01-11 14:51:49 +01:00
Harish Subramony
532e4b8d41
Readme updates with review comments (#8) 2024-01-11 10:12:43 +01:00
Harish Subramony
cb8b7610c0
Update README for proper usage of LIMIT_HPU_GRAPH (#3)
* Update README for proper usage of LIMIT_HPU_GRAPH
2024-01-09 14:49:15 -08:00
Karol Damaszke
252ccde104
Control prefill and decode batch size separately (#6) 2024-01-02 18:21:01 +01:00
Karol Damaszke
1be2d9a8ec
Batch size bucketing (#5) 2023-12-22 21:53:01 +01:00
regisss
e5f124b077 Merge tag 'v1.2.0' into v1.2-release 2023-12-06 18:46:16 +01:00
regisss
cc744ba426 Add changes from Optimum Habana's TGI folder 2023-12-05 11:12:16 +01:00
OlivierDehaene
ccd5725a0c v1.2.0 2023-11-30 15:18:15 +01:00
fxmarty
b2b5df0e94
Add RoCm support (#1243)
This PR adds support for AMD Instinct MI210 & MI250 GPUs, with paged
attention and FAv2 support.

Remaining items to discuss, on top of possible others:
* Should we have a
`ghcr.io/huggingface/text-generation-inference:1.1.0+rocm` hosted image,
or is it too early?
* Should we set up a CI on MI210/MI250? I don't have access to the
runners of TGI though.
* Are we comfortable with those changes being directly in TGI, or do we
need a fork?

---------

Co-authored-by: Felix Marty <felix@hf.co>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
Co-authored-by: Your Name <you@example.com>
2023-11-27 14:08:12 +01:00
Nicolas Patry
b226e469c9
Update README.md (#1272) 2023-11-21 10:39:18 +01:00
OlivierDehaene
457e72c386 v1.1.1 2023-11-16 13:56:04 +01:00
Nicolas Patry
7b5c167487
Update README.md (#1242)
# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-11-07 10:24:53 +01:00
Nicolas Patry
b9184093d9
Narsil patch 1 (#1241)
# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-11-07 10:13:09 +01:00
Nicolas Patry
414a911b34
Adding the video -> moving the architecture picture lower (#1239)
# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-11-07 01:01:40 +01:00
Omar Sanseviero
dd304cf14c
Remove some content from the README in favour of the documentation (#958) 2023-10-09 11:59:06 +02:00
OlivierDehaene
7a6fad6aac update readme 2023-09-28 10:18:18 +02:00
OlivierDehaene
3b56d7669b
feat: add mistral model (#1071) 2023-09-28 09:55:47 +02:00
Nicolas Patry
a049864270
Preping 1.1.0 (#1066)
# What does this PR do?

Upgrade all relevant versions and dependencies.

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-09-27 10:40:18 +02:00
Omar Sanseviero
7d8e5fb284
Update version in docs (#957) 2023-08-31 20:00:12 +02:00
Nicolas Patry
888c029114
Upgrade version number in docs. (#910)
# What does this PR do?

<!--
Congratulations! You've made it this far! You're not quite done yet
though.

Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.

Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.

Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->

<!-- Remove if not applicable -->

Fixes # (issue)


## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests),
      Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/)? Please add a link
      to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation).
- [ ] Did you write any new necessary tests?


## Who can review?

Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.

<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @


@OlivierDehaene OR @Narsil

 -->
2023-08-23 13:45:28 +02:00
Adarsh Shirawalmath
737d5781e4
Update README.md (#848)
@Narsil

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2023-08-15 19:13:52 +02:00
Pasquale Minervini
d8f1337e7e
README edit -- running the service with no GPU or CUDA support (#773)
One-line addition to the README to show how to run the service on a
machine without GPUs or CUDA support (e.g., for local prototyping)

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2023-08-14 15:41:13 +02:00