text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-11 16:05:24 +00:00

Author	SHA1	Message	Date
drbh	1104885f00	Merge branch 'main' into lora-internal	2024-06-14 10:06:15 -04:00
drbh	0e1c28cafd	fix: merge 'main' into lora-internal to resolve conflicts	2024-06-14 14:02:33 +00:00
drbh	06c3254cc5	fix: avoid dockerfile conflict	2024-06-14 13:58:38 +00:00
Alvaro Moran	445f313504	Adding architecture document (#2044 ) * doc: adding architecture document * doc: add architecture to toctree * fix: avoid cargo lock changes * fix: avoid cargo lock tweak --------- Co-authored-by: drbh <david.richard.holtz@gmail.com>	2024-06-14 09:28:34 -04:00
Tiezhen WANG	96b7b40ca3	Update the link for qwen2 (#2068 ) * Update the link for qwen2 * Fix Qwen2 model URL in model table * Fix too eager staging --------- Co-authored-by: Daniël de Kok <me@danieldk.eu>	2024-06-14 11:59:33 +02:00
Daniël de Kok	093a27c528	Add support for GPTQ Marlin (#2052 ) Add support for GPTQ Marlin kernels GPTQ Marlin extends the Marlin kernels to support common GPTQ configurations: - bits: 4 or 8 - groupsize: -1, 32, 64, or 128 - desc_act: true/false Using the GPTQ Marlin kernels requires repacking the parameters in the Marlin quantizer format. The kernels were contributed by Neural Magic to VLLM. We vendor them here for convenience.	2024-06-14 09:45:42 +02:00
drbh	aa88c4fd3a	fix: add lora kernel to dockerfile, support running without kernels and refactors	2024-06-14 00:35:07 +00:00
drbh	f433f1f770	implement Open Inference Protocol endpoints (#1942 ) * feat: add kserve feature and basic routes * feat: implement infer endpoint wrapper around generate * fix: refactor and improve types * fix: improve infer and simplify * fix: cleanup and improve api docs * fix: refactor and encapsulate kserve feat in file * fix: remove typos after rebase	2024-06-13 12:51:51 -04:00
drbh	42aa8ee1bb	PR #2049 CI run (#2054 ) * Use minijinja's pycompat mode for python methods * fix: cargo fmt lint for pre commit --------- Co-authored-by: Armin Ronacher <armin.ronacher@active-4.com>	2024-06-13 11:53:49 -04:00
OlivierDehaene	90184df79c	fix(layers): fix SuRotaryEmbedding (#2060 ) * fix(layers): fix SuRotaryEmbedding * change arange * remove logs	2024-06-12 18:24:47 +02:00
OlivierDehaene	521de6cacd	fix(server): fix OPT implementation (#2061 )	2024-06-12 18:22:20 +02:00
drbh	376a0b7ada	Support chat response format (#2046 ) * feat: support response_format in chat * fix: adjust typos * fix: add trufflehog lint	2024-06-11 10:44:56 -04:00
fxmarty	a6e4d63c86	Update LLMM1 bound (#2050 ) update commit	2024-06-11 19:30:29 +08:00
Luc Georges	dfca1dfc5e	fix(ci): remove unnecessary permissions (#2045 )	2024-06-10 12:16:53 -04:00
Luc Georges	4e74ec09a8	feat(ci): add trufflehog secrets detection (#2038 )	2024-06-10 11:54:13 -04:00
Derek	d6cf63ca53	Update lora.md Fixing spam image	2024-06-10 10:24:21 -04:00
Derek	1be1ebc438	Update lora.md Fixed a typo	2024-06-10 10:24:21 -04:00
drbh	ce40ad26fd	fix: add model_id to IdeficsCausalLM	2024-06-10 10:24:21 -04:00
drbh	101b95adc4	fix: update all models forwards to include adapter_data	2024-06-10 10:24:21 -04:00
drbh	1deb372564	fix: add adapter_data param to phi and neox	2024-06-10 10:24:21 -04:00
drbh	b1169273fd	fix: add adapter_data param and avoid missing layers	2024-06-10 10:24:21 -04:00
drbh	91f407226d	feat: support if vlm models	2024-06-10 10:24:21 -04:00
drbh	a563a93113	fix: rename doc to retry ci build	2024-06-10 10:24:21 -04:00
drbh	611225f017	feat: support base model generation and refactors	2024-06-10 10:24:21 -04:00
drbh	43ec9dfe32	feat: bump launcher and add new lora docs	2024-06-10 10:24:21 -04:00
drbh	81707bfbfa	fix: include rust code for adapter id	2024-06-10 10:23:52 -04:00
drbh	68399c1ae3	feat: prefer model id in request	2024-06-10 10:23:52 -04:00
drbh	de56a81c5c	feat: add lora support to mistral and refactors	2024-06-10 10:23:52 -04:00
drbh	9c45d34983	fix: add model_id to model test	2024-06-10 10:23:52 -04:00
drbh	dc0f76553c	fix: pass model_id for all causal and seq2seq lms	2024-06-10 10:23:52 -04:00
drbh	88bd5c2c92	fix: pass model_id for all flash causal lms	2024-06-10 10:23:52 -04:00
drbh	73eb2ae255	fix: refactor and move changes to v3 proto	2024-06-10 10:23:52 -04:00
drbh	c927376725	fix: adjust adapter_segments logic when in batch	2024-06-10 10:23:52 -04:00
drbh	ad088d51fa	fix: adjust batch for bgmv	2024-06-10 10:23:52 -04:00
drbh	8984ce6c69	feat: perfer loraxs custom punica kernels and add mlp loras	2024-06-10 10:23:52 -04:00
drbh	d5f21d57d1	fix: prefer adapter_data and refactors	2024-06-10 10:23:52 -04:00
drbh	8b50f4b779	feat: prefer lorax implementation and port loading logic	2024-06-10 10:23:52 -04:00
drbh	c661631225	feat: baseline impl single request multi lora support	2024-06-10 10:23:52 -04:00
drbh	a046c303f7	fix: refactor and reduce lora math	2024-06-10 10:23:52 -04:00
drbh	0a6ea7fb57	feat: load weights within layer and refactor lora pass	2024-06-10 10:23:52 -04:00
drbh	db3d8e6518	feat: first draft load multiple lora	2024-06-10 10:23:52 -04:00
Daniël de Kok	85dfc39222	Add Phi-3 medium support (#2039 ) Add support for Phi-3-medium The main difference between the medium and mini models is that medium uses grouped query attention with a packed QKV matrix. This change adds support for GQA with packed matrixes to `Weights.get_weights_col_packed` and uses it for Phi-3. This also allows us to remove the custom implementation of GQA from dbrx attention loading.	2024-06-10 09:22:29 +02:00
fxmarty	9b3674d903	ROCm and sliding windows fixes (#2033 ) * update vllm commit & fix models using sliding window * update * update commit * fix bug where tunableop is bound to cuda graph even when cuda graph are disabled * enable tunableop by default * fix sliding window * address review * dead code * precise comment * is it flaky?	2024-06-10 15:09:50 +08:00
Daniël de Kok	bf3c813782	server: use chunked inputs The router will now send the input as chunks besides as a single string. This change modifies the server to process chunked input rather than strings. This also allows us to remove the image extraction code from the server.	2024-06-07 08:09:04 +02:00
Wang, Yi	4dabddb7ea	Xpu gqa (#2013 ) # What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @OlivierDehaene OR @Narsil --> Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2024-06-06 19:12:57 +02:00
Nicolas Patry	9765658212	Revert "Enabling CI for AMD with new runner.." This reverts commit `101ac9a760`.	2024-06-06 19:08:16 +02:00
Nicolas Patry	101ac9a760	Enabling CI for AMD with new runner..	2024-06-06 19:07:48 +02:00
Nicolas Patry	ed1cfde0d8	Internal runner ? (#2023 ) # What does this PR do? <!-- Congratulations! You've made it this far! You're not quite done yet though. Once merged, your PR is going to appear in the release notes with the title you set, so make sure it's a great title that fully reflects the extent of your awesome contribution. Then, please replace this with a description of the change and which issue is fixed (if applicable). Please also include relevant motivation and context. List any dependencies (if any) that are required for this change. Once you're done, someone will review your PR shortly (see the section "Who can review?" below to tag some potential reviewers). They may suggest changes to make the code even better. If no one reviewed your PR after a week has passed, don't hesitate to post a new comment @-mentioning the same persons---sometimes notifications get lost. --> <!-- Remove if not applicable --> Fixes # (issue) ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Did you read the [contributor guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests), Pull Request section? - [ ] Was this discussed/approved via a Github issue or the [forum](https://discuss.huggingface.co/)? Please add a link to it if that's the case. - [ ] Did you make sure to update the documentation with your changes? Here are the [documentation guidelines](https://github.com/huggingface/transformers/tree/main/docs), and [here are tips on formatting docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation). - [ ] Did you write any new necessary tests? ## Who can review? Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR. <!-- Your PR will be replied to more quickly if you can figure out the right person to tag with @ @OlivierDehaene OR @Narsil -->	2024-06-06 18:51:42 +02:00
Daniël de Kok	51621439a4	marlin: improve build	2024-06-06 17:19:46 +02:00
Daniël de Kok	0d96468ebb	marlin: support tp>1 when group_size==-1	2024-06-06 17:19:28 +02:00

1 2 3 4 5 ...

801 Commits