OlivierDehaene
|
2f88d8dfb3
|
fix: default max_new_tokens to 100
|
2024-04-19 12:09:05 +03:00 |
|
OlivierDehaene
|
05f8c85a8b
|
v1.3.2
|
2024-04-18 16:33:05 +03:00 |
|
OlivierDehaene
|
f9b58ac7a1
|
feat: add quant to mixtral (#1337)
|
2024-04-18 16:32:50 +03:00 |
|
OlivierDehaene
|
09c556dbd7
|
v1.3.1
|
2024-04-18 16:32:07 +03:00 |
|
OlivierDehaene
|
db5053fc86
|
v1.3.0
|
2024-04-18 16:31:53 +03:00 |
|
OlivierDehaene
|
79f268f95a
|
chore: formatting
|
2024-04-18 16:26:00 +03:00 |
|
OlivierDehaene
|
9aef902982
|
feat: mixtral (#1328)
|
2024-04-18 12:39:52 +00:00 |
|
Nicolas Patry
|
a7f52f3812
|
Speculative (#1308)
|
2024-04-18 12:39:39 +00:00 |
|
Nicolas Patry
|
a41c1a6bc7
|
Add a stale bot. (#1313)
|
2024-04-18 10:10:02 +03:00 |
|
fxmarty
|
ab34c16610
|
Fix AMD documentation (#1307)
As per title
|
2024-04-18 10:09:36 +03:00 |
|
Jacek Czaja
|
ae6215fcea
|
Enable server UT: test_causal_lm.py::test_batch_from_pb (#121)
Co-authored-by: Jacek Czaja <jczaja@habana.ai>
|
2024-04-10 16:33:56 +02:00 |
|
Karol Damaszke
|
30cc78773e
|
Skip server tests of not enabled models (#125)
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
|
2024-04-09 14:15:41 +02:00 |
|
Karol Damaszke
|
c6739526c6
|
Fix test_watermark (#124)
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
|
2024-04-09 11:29:21 +02:00 |
|
Sylwester Fraczek
|
757c12dbac
|
Fix test_pass_through_tokenizer (#117)
Co-authored-by: Sylwester Fraczek <sfraczek@habana.ai>
|
2024-04-09 09:30:47 +02:00 |
|
Karol Damaszke
|
d957e32601
|
Add Habana copyright header (#122)
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
|
2024-04-08 18:06:21 +02:00 |
|
Karol Damaszke
|
06227f7b5e
|
Fix router tests (#119)
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
|
2024-04-04 11:10:11 +02:00 |
|
Karol Damaszke
|
e210e15e27
|
Update Cargo.lock file (#118)
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
|
2024-04-03 17:55:54 +02:00 |
|
Karol Damaszke
|
b0de25a285
|
Don't set rope_scaling for unsupported models (#115)
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
|
2024-04-02 12:12:02 +02:00 |
|
yuanwu2017
|
3e28d7aa42
|
Align the default value with server's (#111)
Signed-off-by: yuanwu <yuan.wu@intel.com>
|
2024-04-01 12:44:20 +02:00 |
|
Karol Damaszke
|
7342baa2eb
|
Add support for rope_scaling and remove is_optimized_for_gaudi (#112)
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
|
2024-03-29 15:07:32 +01:00 |
|
Karol Damaszke
|
bf5263b88b
|
Disable watermark with FP8 quantization (#114)
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
|
2024-03-27 13:32:20 +01:00 |
|
jkaniecki
|
56f00a552b
|
Adjust warmup to all possible bucket sizes and decode batch size = 1 (#113)
|
2024-03-27 11:59:51 +01:00 |
|
Karol Damaszke
|
9796b0e10d
|
Add simple continuous batching benchmark (#108)
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>
|
2024-03-26 09:17:55 +01:00 |
|
regisss
|
7f58680999
|
Add docker pull command in README (#110)
|
2024-03-25 15:44:54 +01:00 |
|
jkaniecki
|
2b1581edac
|
Warmup greedy search in next token chooser (#109)
|
2024-03-22 23:43:20 +01:00 |
|
Wang, Yi
|
d752317b5f
|
Correct input_length since habana extend input_length to max_input_length (#103)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
|
2024-03-18 15:23:13 +01:00 |
|
Karol Damaszke
|
b45f648483
|
Add warmup for logits processors (#107)
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
|
2024-03-18 15:17:47 +01:00 |
|
jkaniecki
|
8504f9c41c
|
Improve README clarity (#106)
|
2024-03-18 15:15:07 +01:00 |
|
yuanwu2017
|
a4d5c3f40f
|
Fix the generate_stream crash in concurrent query (#105)
Signed-off-by: yuanwu <yuan.wu@intel.com>
|
2024-03-15 10:54:56 +01:00 |
|
Wang, Yi
|
3d81a80577
|
Fix incorrect setting of max_new_tokens in warmup (#104)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
|
2024-03-13 16:19:40 +01:00 |
|
Yao Matrix
|
7149ac30e6
|
Fix the issue of out of range (#98)
Signed-off-by: yuanwu <yuan.wu@intel.com>
Co-authored-by: yuanwu <yuan.wu@intel.com>
|
2024-03-13 10:09:53 +01:00 |
|
jkaniecki
|
602a920ec5
|
Update nix version (#102)
|
2024-03-11 16:21:04 +01:00 |
|
Karol Damaszke
|
365f277900
|
Clean-up README (#96)
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
|
2024-03-10 22:02:15 +01:00 |
|
Karol Damaszke
|
8e14780bf4
|
Wait 2sec once shard is ready to improve stability (#92) (#94)
Co-authored-by: mswiniarsk <156412439+mswiniarsk@users.noreply.github.com>
|
2024-03-04 12:17:24 +01:00 |
|
Karol Damaszke
|
80ae9ead28
|
Set MAX_TOTAL_TOKENS automatically (#91)
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
|
2024-03-01 11:25:15 +01:00 |
|
Karol Damaszke
|
a5c788cfe4
|
Remove redundant fill op (#83) (#90)
Co-authored-by: mswiniarsk <156412439+mswiniarsk@users.noreply.github.com>
|
2024-03-01 01:32:02 +01:00 |
|
Karol Damaszke
|
03c2123244
|
Use batched index_copy (#73) (#89)
Co-authored-by: madamczykhabana <110973826+madamczykhabana@users.noreply.github.com>
|
2024-02-29 15:45:16 +01:00 |
|
Karol Damaszke
|
8f6564ce0e
|
Heap based router queue (#63) (#88)
Co-authored-by: mrs303 <54661797+mrs303@users.noreply.github.com>
|
2024-02-29 10:56:26 +01:00 |
|
Karol Damaszke
|
7dbf4bf7a4
|
Improve tensor slicing performance (#66) (#87)
Co-authored-by: mswiniarsk <156412439+mswiniarsk@users.noreply.github.com>
|
2024-02-29 10:48:54 +01:00 |
|
Karol Damaszke
|
3831f1bed5
|
Add warmup for shift operation (#59) (#86)
|
2024-02-29 09:19:28 +01:00 |
|
Karol Damaszke
|
022ce1eaaf
|
Overhead reduction (#58) (#85)
Co-authored-by: mrs303 <54661797+mrs303@users.noreply.github.com>
|
2024-02-29 09:17:45 +01:00 |
|
Karol Damaszke
|
212136dff8
|
Log exceptions to debug.log (#52) (#84)
Co-authored-by: madamczykhabana <110973826+madamczykhabana@users.noreply.github.com>
|
2024-02-29 09:14:42 +01:00 |
|
Karol Damaszke
|
c7ccfb87ff
|
Grouped pad/shift/move operations (#57) (#82)
Co-authored-by: madamczykhabana <110973826+madamczykhabana@users.noreply.github.com>
|
2024-02-29 04:16:44 +01:00 |
|
Karol Damaszke
|
2122acc60f
|
Add warmup for all possible shapes for prefill #49 (#81)
|
2024-02-28 10:40:13 +01:00 |
|
Karol Damaszke
|
31bed905d4
|
Update habana profiler (#50) (#80)
Co-authored-by: mswiniarsk <156412439+mswiniarsk@users.noreply.github.com>
|
2024-02-28 09:57:40 +01:00 |
|
Karol Damaszke
|
d31fb62576
|
Add more info to high-level profiler events (#46) (#79)
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
|
2024-02-28 09:55:50 +01:00 |
|
Karol Damaszke
|
941d36f3fd
|
Enable deferred token generation (#44) (#75)
Co-authored-by: Krzysztof Laskowski <klaskowski@habana.ai>
|
2024-02-27 15:46:40 +01:00 |
|
Karol Damaszke
|
6248c5610e
|
Revert "Prefer prefill instead of decode when max_waiting_tokens==0 (#18)" (#45) (#76)
Co-authored-by: mswiniarsk <156412439+mswiniarsk@users.noreply.github.com>
|
2024-02-27 11:56:45 +01:00 |
|
jkaniecki
|
83b059bd27
|
Bulk shifting (#40) (#70)
Co-authored-by: madamczykhabana <110973826+madamczykhabana@users.noreply.github.com>
|
2024-02-26 17:29:56 +01:00 |
|
regisss
|
8f4aba6ad3
|
Update dependencies (#69)
|
2024-02-25 13:07:47 +01:00 |
|