yuanwu
92a1e0fbae
Aligin the source code with main branch 2.0.4
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-09-24 03:06:55 +00:00
regisss
c09f5bc930
Merge pull request #187 from yuanwu2017/v2.0.4
2024-08-12 23:59:03 +02:00
Sun Choi
cf2ff5a1dd
Revert PR#178 ( #191 )
2024-08-11 09:29:30 +02:00
yuanwu
588a014551
Enable llava-next
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-07-29 21:55:31 +00:00
Sun Choi
fff1d4f86f
Add bucket for input seq len exactly same as --max-input-length ( #178 )
2024-07-05 10:30:26 +02:00
Karol Damaszke
535a35db17
Set unique request id during warmup ( #170 )
2024-07-03 10:58:20 +02:00
Karol Damaszke
4fe871ffaa
Adjust max_new_tokens in warmup ( #160 )
2024-06-20 19:48:37 +02:00
Karol Damaszke
0e8f8726db
Warmup all decode buckets ( #152 )
...
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
2024-05-29 22:46:55 +02:00
Karol Damaszke
bad7fe720a
Fix warmup shapes for corner cases ( #136 )
...
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
2024-05-06 11:35:27 +02:00
OlivierDehaene
31b5e37f49
chore: add pre-commit ( #1569 )
2024-04-24 15:32:02 +03:00
drbh
55acb86f42
Outlines guided generation ( #1539 )
...
This WIP PR starts to add grammar support via outlines, currently this
PR supports very simple regex grammars and does not optimize for
precompiling or caching grammar fsm's.
todo:
- [X] add simple outlines guidance to `NextTokenChooser`
- [X] update protos for grammar
- [X] update generation params API
- [X] constrain simple grammar
- [ ] support parsing more complex grammar into fsm
- [ ] support all outline support grammar types
- [ ] explore optimizations to avoid recompiling grammars
guided request
```bash
curl -s 'http://localhost:3000/generate ' \
--header 'Content-Type: application/json' \
--data-raw '{
"inputs": "make an email for david: \n",
"parameters": {
"max_new_tokens": 6,
"grammar": "[\\w-]+@([\\w-]+\\.)+[\\w-]+"
}
}' | jq
```
response
```json
{
"generated_text": "david@example.com"
}
```
unguided request
```bash
curl -s 'http://localhost:3000/generate ' \
--header 'Content-Type: application/json' \
--data '{
"inputs": "make an email for david: \n",
"parameters": {
"max_new_tokens": 6
}
}' | jq
```
response
```json
{
"generated_text": " email = 'david"
}
```
2024-04-24 14:57:37 +03:00
OlivierDehaene
518d30dec4
feat(router): add max_batch_size ( #1542 )
...
Some hardware require a maximum batch size.
2024-04-24 09:21:57 +00:00
OlivierDehaene
f1d8da3ba6
feat(server): add frequency penalty ( #1541 )
2024-04-24 08:43:50 +00:00
drbh
935ee00749
chore: bump rust version and annotate/fix all clippy warnings ( #1455 )
...
This PR just bumps the latest rust version and makes clippy happy
```bash
cargo clippy --all -- -D warnings
# Finished dev [unoptimized + debuginfo] target(s) in 0.10s
```
2024-04-22 11:53:28 +03:00
OlivierDehaene
b7299e1b7f
fix: fix gpt-q with groupsize = -1 ( #1358 )
2024-04-19 15:05:50 +03:00
OlivierDehaene
5c9ef069ed
feat: add more latency metrics in forward ( #1346 )
2024-04-19 13:41:34 +03:00
Nicolas Patry
a7f52f3812
Speculative ( #1308 )
2024-04-18 12:39:39 +00:00
Karol Damaszke
d957e32601
Add Habana copyright header ( #122 )
...
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
2024-04-08 18:06:21 +02:00
yuanwu2017
3e28d7aa42
Align the default value with server's ( #111 )
...
Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-04-01 12:44:20 +02:00
Karol Damaszke
bf5263b88b
Disable watermark with FP8 quantization ( #114 )
...
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
2024-03-27 13:32:20 +01:00
jkaniecki
56f00a552b
Adjust warmup to all possible bucket sizes and decode batch size = 1 ( #113 )
2024-03-27 11:59:51 +01:00
jkaniecki
2b1581edac
Warmup greedy search in next token chooser ( #109 )
2024-03-22 23:43:20 +01:00
Karol Damaszke
b45f648483
Add warmup for logits processors ( #107 )
...
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
2024-03-18 15:17:47 +01:00
Wang, Yi
3d81a80577
Fix incorrect setting of max_new_tokens in warmup ( #104 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2024-03-13 16:19:40 +01:00
Karol Damaszke
2122acc60f
Add warmup for all possible shapes for prefill #49 ( #81 )
2024-02-28 10:40:13 +01:00
OlivierDehaene
f9910d13e2
feat: remove flume ( #1184 )
2023-10-23 15:51:12 +02:00
OlivierDehaene
5e28f44a83
#1049 CI ( #1178 )
...
See #1049
---------
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Wang, Yi <yi.a.wang@intel.com>
2023-10-20 10:28:45 +02:00
Nicolas Patry
211b54ac41
Rebased #617 ( #868 )
...
# What does this PR do?
<!--
Congratulations! You've made it this far! You're not quite done yet
though.
Once merged, your PR is going to appear in the release notes with the
title you set, so make sure it's a great title that fully reflects the
extent of your awesome contribution.
Then, please replace this with a description of the change and which
issue is fixed (if applicable). Please also include relevant motivation
and context. List any dependencies (if any) that are required for this
change.
Once you're done, someone will review your PR shortly (see the section
"Who can review?" below to tag some potential reviewers). They may
suggest changes to make the code even better. If no one reviewed your PR
after a week has passed, don't hesitate to post a new comment
@-mentioning the same persons---sometimes notifications get lost.
-->
<!-- Remove if not applicable -->
Fixes # (issue)
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Did you read the [contributor
guideline](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#start-contributing-pull-requests ),
Pull Request section?
- [ ] Was this discussed/approved via a Github issue or the
[forum](https://discuss.huggingface.co/ )? Please add a link
to it if that's the case.
- [ ] Did you make sure to update the documentation with your changes?
Here are the
[documentation
guidelines](https://github.com/huggingface/transformers/tree/main/docs ),
and
[here are tips on formatting
docstrings](https://github.com/huggingface/transformers/tree/main/docs#writing-source-documentation ).
- [ ] Did you write any new necessary tests?
## Who can review?
Anyone in the community is free to review the PR once the tests have
passed. Feel free to tag
members/contributors who may be interested in your PR.
<!-- Your PR will be replied to more quickly if you can figure out the
right person to tag with @
@OlivierDehaene OR @Narsil
-->
---------
Co-authored-by: Vincent Brouwers <vincent.brouwers@ing.com>
2023-08-28 11:43:47 +02:00
OlivierDehaene
73a4d65d26
feat: add cuda memory fraction ( #659 )
...
Close #673
2023-07-24 11:43:58 +02:00
OlivierDehaene
fe80f5360c
feat(server): auto max_batch_total_tokens for flash att models ( #630 )
2023-07-19 09:31:25 +02:00
OlivierDehaene
b7327205a6
feat(launcher): add arg validation and drop subprocess ( #595 )
2023-07-13 14:22:37 +02:00
OlivierDehaene
e74bd41e0f
feat(server): add paged attention to flash models ( #516 )
...
Closes #478
2023-06-30 19:09:59 +02:00
OlivierDehaene
218c9adaa5
feat: decrease IPC proto size ( #367 )
...
Closes #307 #308
2023-05-24 19:19:57 +02:00
OlivierDehaene
68e9d6ab33
feat(server): shard token decode ( #303 )
2023-05-10 15:48:21 +02:00
Ehsan M. Kermani
f092ba9b22
feat(server): add watermarking tests ( #248 )
2023-04-27 19:16:35 +02:00
Nicolas Patry
db2b4e0754
feat(router): new healthcheck that skips the queue ( #244 )
...
Co-authored-by: OlivierDehaene <23298448+OlivierDehaene@users.noreply.github.com>
Co-authored-by: OlivierDehaene <olivier@huggingface.co>
2023-04-26 20:23:54 +02:00
OlivierDehaene
ebc74d5666
feat(router): use number of tokens in batch as input for dynamic batching ( #226 )
...
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2023-04-24 17:59:00 +02:00
OlivierDehaene
343437c7b5
feat(router): add device and dtype info ( #215 )
2023-04-21 15:36:29 +02:00
OlivierDehaene
5cddc055e6
fix(rust-client): use join_all instead of select_all to hopefully fix nccl issues ( #162 )
2023-04-09 20:07:02 +02:00
OlivierDehaene
f000068944
feat(server): clear cache on error ( #143 )
2023-03-28 11:29:35 +02:00
OlivierDehaene
9af454142a
feat: add distributed tracing ( #62 )
2023-02-13 13:02:45 +01:00
OlivierDehaene
20c3c5940c
feat(router): refactor API and add openAPI schemas ( #53 )
2023-02-03 12:43:37 +01:00
OlivierDehaene
017a2a8c2f
feat: Add token streaming using ServerSideEvents support ( #41 )
2023-01-31 17:04:00 +01:00
OlivierDehaene
4f9ac67cfa
Revert "feat: Add token streaming using ServerSideEvents support" ( #40 )
...
Reverts huggingface/text-generation-inference#36
2023-01-31 14:21:51 +01:00
OlivierDehaene
7fbfbb0dc5
feat: Add token streaming using ServerSideEvents support ( #36 )
...
Add token streaming using ServerSideEvents (SSE).
The signature of the SSE events is:
```rust
struct Details {
finish_reason: String,
generated_tokens: u32,
seed: Option<u64>,
}
struct StreamResponse {
token: Token,
generated_text: Option<String>,
details: Option<Details>,
}
struct ErrorResponse {
error: String,
}
```
2023-01-31 11:49:43 +01:00
OlivierDehaene
32a253063d
feat: Return logprobs ( #8 )
2022-12-15 17:03:56 +01:00
OlivierDehaene
718096f695
feat: Support stop sequences ( #7 )
2022-12-12 18:25:22 +01:00
OlivierDehaene
09674e6df9
feat(server): Support bitsandbytes
2022-10-27 14:25:29 +02:00
OlivierDehaene
beb552127a
feat(client): Simplify sharded logic
2022-10-22 23:40:05 +02:00
Olivier Dehaene
f16f2f5ae1
v0.1.0
2022-10-20 19:14:44 +02:00