text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-09 23:15:23 +00:00

Author	SHA1	Message	Date
yuanwu2017	8d84ffabf2	Upgrade to SynapseAI 1.18 (#227 ) Signed-off-by: yuanwu <yuan.wu@intel.com> Co-authored-by: Thanaji Rao Thakkalapelli <tthakkalapelli@habana.ai>	2024-10-31 20:14:44 +01:00
Thanaji Rao Thakkalapelli	7fb4af9a87	updated supported models list table in readme (#241 ) * updated supported models list table in readme * updated read me * updated read me	2024-10-29 23:28:45 -07:00
Thanaji Rao Thakkalapelli	b126bf4785	Revert pr 235 as flash attention is not really enabled for gemma (#239 )	2024-10-23 10:58:57 +02:00
Thanaji Rao Thakkalapelli	c5e3881051	Enables Flash Attention in TGI for gemma models (#235 )	2024-10-18 09:20:42 -07:00
Alessandro de Oliveira Faria (A.K.A.CABELO)	9ae5ad5057	requirements name - cabelo@opensuse.org (#237 )	2024-10-18 09:20:05 -07:00
Thanaji Rao Thakkalapelli	46b14e6b28	Remove all references to habana_quantization_toolkit for 1.18 (#229 )	2024-10-18 10:59:59 +02:00
Thanaji Rao Thakkalapelli	21c13ff3a6	Remove References to torch compile mode in readme (#236 )	2024-10-17 14:07:51 -07:00
Sun Choi	8ae5d4c7d6	Ignore EOS for benchmark by using TEXT_GENERATION_SERVER_IGNORE_EOS_TOKEN (#234 )	2024-10-16 11:57:36 +02:00
Mandy Li	d07e7f4f62	Merge pull request #233 from huggingface/fix_sysntax Fix sysntax error in PR 232	2024-10-15 14:33:21 -07:00
Thanaji Rao Thakkalapelli	87a1cee32c	Fix sysntax error in PR 232	2024-10-15 13:23:48 -07:00
Thanaji Rao Thakkalapelli	e06320f64e	Enabling Flash Attention support for falcon model (#232 )	2024-10-15 19:50:17 +02:00
Sun Choi	0578bd917d	Fix gpt_bigcode/starcoderbase-3b accuracy issue (#228 ) Co-authored-by: Thanaji Rao Thakkalapelli <tthakkalapelli@habana.ai>	2024-10-14 10:01:55 +02:00
Mohit Deopujari	fe8a373831	Enhancements to README (#226 )	2024-10-02 12:22:33 +02:00
yuanwu2017	e424752fa3	Enable the AutoGPTQ (#217 ) Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-09-25 18:55:02 +02:00
regisss	0deebe7012	Update README with Docker image v2.0.5	2024-09-07 17:56:52 +00:00
regisss	bf9865e956	Upgrade to Optimum Habana v1.13.2 (#222 )	2024-09-07 19:52:59 +02:00
Thanaji Rao Thakkalapelli	a4f39a1cae	Update README.md with changes related to LLava-next multi card support (#221 )	2024-09-07 17:46:21 +02:00
Thanaji Rao Thakkalapelli	ad7c620f0f	Llava-next: Added flash_attention_recompute option (#220 )	2024-09-06 22:20:07 +02:00
yuanwu2017	2299b739fe	Only Apply the TP in language_model (#219 ) Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-09-06 22:19:24 +02:00
Thanaji Rao Thakkalapelli	73d93bdd93	Downgrade sympy to match synapaseAI 1.18 base image (#215 )	2024-08-28 17:45:44 +02:00
Thanaji Rao Thakkalapelli	fde061ccf8	Updated docker image version to 2.0.4 (#212 ) Co-authored-by: Thanaji Thakkalapelli <tthakkalapelli@tthakkalapelli-vm-u22.habana-labs.com>	2024-08-27 10:14:27 +02:00
yuanwu2017	2985503900	llava-next Fp8 (#209 ) Signed-off-by: yuanwu <yuan.wu@intel.com> Co-authored-by: Thanaji Rao Thakkalapelli <tthakkalapelli@habana.ai> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>	2024-08-26 16:53:08 +02:00
Wang, Chang	55d60a103c	Add qwen2 fp8 support (#210 ) Signed-off-by: changwang <changwang@habana.ai> Co-authored-by: changwang <changwang@habana.ai>	2024-08-26 11:02:58 +02:00
Thanaji Rao Thakkalapelli	e33db1877c	Updated Readme to use flash attention for llama (#200 )	2024-08-26 11:01:11 +02:00
Vidya Galli	c925bd2872	Undo disable of hpu graphs for starcoder (#201 )	2024-08-26 10:58:01 +02:00
Thanaji Rao Thakkalapelli	0c3239e710	Enable quantization with INC (#203 )	2024-08-26 10:55:37 +02:00
Sun Choi	ea48ae169a	Make prefill time of static benchmark correct (#214 )	2024-08-26 10:51:28 +02:00
yuanwu2017	a8cead1f92	Upgrade SynapseAI version to 1.17.0 (#208 ) Signed-off-by: yuanwu <yuan.wu@intel.com> Co-authored-by: Thanaji Rao Thakkalapelli <tthakkalapelli@habana.ai> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>	2024-08-26 10:49:29 +02:00
yuanwu2017	369e499a66	Simplify the warmup process (#173 ) Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-08-15 12:04:14 +02:00
Sun Choi	e3f0f85b70	Pad token handling for Llama3.1 (#199 )	2024-08-13 00:00:41 +02:00
regisss	c09f5bc930	Merge pull request #187 from yuanwu2017/v2.0.4	2024-08-12 23:59:03 +02:00
Abhilash Majumder	d403575c43	Make bf16 default for hpu, fix script (#205 )	2024-08-11 10:48:35 +02:00
Sun Choi	cf2ff5a1dd	Revert PR#178 (#191 )	2024-08-11 09:29:30 +02:00
regisss	a41e974c3b	Merge branch 'habana-main' into v2.0.4	2024-08-10 12:54:00 +02:00
geoffrey papilion	e36a9c57f0	Code expects newer huggingface_hub versions, tested and this resolves issues with streaming response format (#190 ) Co-authored-by: Geoffrey Papilion <gpapilion@ebay.com>	2024-08-08 13:07:27 +02:00
Jacek Czaja	256a97231b	Removed redundant and crash causing regions to be a subject to Torch compile (#194 ) Co-authored-by: Jacek Czaja <jczaja@habana.ai>	2024-08-08 13:06:20 +02:00
Vidya Galli	4dc67e4ef3	Version check, doc fixes (#182 )	2024-08-07 22:09:51 +02:00
Abhilash Majumder	9b71343328	Integrate flash attention for starcoder2 tgi through habana and some fixes, enabling (#198 )	2024-08-07 22:06:05 +02:00
yuanwu	d34ffc4fe9	Refile the hpu warmup Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-08-02 04:36:59 +00:00
yuanwu	05c13c89de	Remove useless modification Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-07-30 10:05:38 +00:00
yuanwu	3f0f0e0825	Add the habana profiler Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-07-30 03:53:46 +00:00
yuanwu	db0b6567e1	Remove log Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-07-29 22:02:42 +00:00
yuanwu	588a014551	Enable llava-next Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-07-29 21:55:31 +00:00
yuanwu2017	d3155d6f41	Merge branch 'habana-main' into v2.0.4	2024-07-17 13:45:15 +08:00
yuanwu	b34edc2ee9	Upgrade to 2.0.4 Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-07-17 05:36:58 +00:00
Nicolas Patry	179336888e	Modifing the version number.	2024-07-17 05:36:58 +00:00
Nicolas Patry	42b0847a80	Fixing codellama loads by using purely `AutoTokenizer`. (#1947 ) - The need for the slow tokenizer default stems from back when llama 1 was introduced and all the flags where not supported in `tokenizers`. - Fixes #1891 # What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @OlivierDehaene OR @Narsil -->	2024-07-17 05:36:58 +00:00
Nicolas Patry	075092315e	Improving the logging system. (#1938 ) - Added a debug log for speculated ids (helps seeing in logs quality of a speculator). - Remove newlines from child process logs when re-emitting in non JSON mode. - Made standard level be closer to what's expected (only our binaries level). - Propagate that level correctly to the shard (was forced into INFO). # What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @OlivierDehaene OR @Narsil -->	2024-07-17 05:36:58 +00:00
Thomas Schillaci	4239e4d327	Add completion route to client and add stop parameter where it's missing (#1869 ) # What does this PR do? - Add the stop parameter to the completion route - Add the completion method to the python client - Add the stop parameter to the python client's chat method ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [x] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. @Narsil --------- Co-authored-by: Thomas SCHILLACI <tschilla@px101.prod.exalead.com> Co-authored-by: Thomas Schillaci <thomas.schillaci@3ds.com>	2024-07-17 05:36:58 +00:00
Nicolas Patry	7cf21294d1	Fixing some legacy behavior (big swapout of serverless on legacy stuff). (#1937 ) # What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @OlivierDehaene OR @Narsil --> --------- Co-authored-by: Daniël de Kok <me@github.danieldk.eu>	2024-07-17 05:36:58 +00:00

1 2 3 4 5 ...

872 Commits