text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-10 23:45:23 +00:00

Author	SHA1	Message	Date
OlivierDehaene	2f88d8dfb3	fix: default max_new_tokens to 100	2024-04-19 12:09:05 +03:00
OlivierDehaene	05f8c85a8b	v1.3.2	2024-04-18 16:33:05 +03:00
OlivierDehaene	f9b58ac7a1	feat: add quant to mixtral (#1337 )	2024-04-18 16:32:50 +03:00
OlivierDehaene	09c556dbd7	v1.3.1	2024-04-18 16:32:07 +03:00
OlivierDehaene	db5053fc86	v1.3.0	2024-04-18 16:31:53 +03:00
OlivierDehaene	79f268f95a	chore: formatting	2024-04-18 16:26:00 +03:00
OlivierDehaene	9aef902982	feat: mixtral (#1328 )	2024-04-18 12:39:52 +00:00
Nicolas Patry	a7f52f3812	Speculative (#1308 )	2024-04-18 12:39:39 +00:00
Nicolas Patry	a41c1a6bc7	Add a stale bot. (#1313 )	2024-04-18 10:10:02 +03:00
fxmarty	ab34c16610	Fix AMD documentation (#1307 ) As per title	2024-04-18 10:09:36 +03:00
Jacek Czaja	ae6215fcea	Enable server UT: test_causal_lm.py::test_batch_from_pb (#121 ) Co-authored-by: Jacek Czaja <jczaja@habana.ai>	2024-04-10 16:33:56 +02:00
Karol Damaszke	30cc78773e	Skip server tests of not enabled models (#125 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-04-09 14:15:41 +02:00
Karol Damaszke	c6739526c6	Fix test_watermark (#124 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-04-09 11:29:21 +02:00
Sylwester Fraczek	757c12dbac	Fix test_pass_through_tokenizer (#117 ) Co-authored-by: Sylwester Fraczek <sfraczek@habana.ai>	2024-04-09 09:30:47 +02:00
Karol Damaszke	d957e32601	Add Habana copyright header (#122 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-04-08 18:06:21 +02:00
Karol Damaszke	06227f7b5e	Fix router tests (#119 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-04-04 11:10:11 +02:00
Karol Damaszke	e210e15e27	Update Cargo.lock file (#118 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-04-03 17:55:54 +02:00
Karol Damaszke	b0de25a285	Don't set rope_scaling for unsupported models (#115 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-04-02 12:12:02 +02:00
yuanwu2017	3e28d7aa42	Align the default value with server's (#111 ) Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-04-01 12:44:20 +02:00
Karol Damaszke	7342baa2eb	Add support for rope_scaling and remove is_optimized_for_gaudi (#112 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-03-29 15:07:32 +01:00
Karol Damaszke	bf5263b88b	Disable watermark with FP8 quantization (#114 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-03-27 13:32:20 +01:00
jkaniecki	56f00a552b	Adjust warmup to all possible bucket sizes and decode batch size = 1 (#113 )	2024-03-27 11:59:51 +01:00
Karol Damaszke	9796b0e10d	Add simple continuous batching benchmark (#108 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>	2024-03-26 09:17:55 +01:00
regisss	7f58680999	Add `docker pull` command in README (#110 )	2024-03-25 15:44:54 +01:00
jkaniecki	2b1581edac	Warmup greedy search in next token chooser (#109 )	2024-03-22 23:43:20 +01:00
Wang, Yi	d752317b5f	Correct input_length since habana extend input_length to max_input_length (#103 ) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2024-03-18 15:23:13 +01:00
Karol Damaszke	b45f648483	Add warmup for logits processors (#107 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-03-18 15:17:47 +01:00
jkaniecki	8504f9c41c	Improve README clarity (#106 )	2024-03-18 15:15:07 +01:00
yuanwu2017	a4d5c3f40f	Fix the generate_stream crash in concurrent query (#105 ) Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-03-15 10:54:56 +01:00
Wang, Yi	3d81a80577	Fix incorrect setting of max_new_tokens in warmup (#104 ) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2024-03-13 16:19:40 +01:00
Yao Matrix	7149ac30e6	Fix the issue of out of range (#98 ) Signed-off-by: yuanwu <yuan.wu@intel.com> Co-authored-by: yuanwu <yuan.wu@intel.com>	2024-03-13 10:09:53 +01:00
jkaniecki	602a920ec5	Update nix version (#102 )	2024-03-11 16:21:04 +01:00
Karol Damaszke	365f277900	Clean-up README (#96 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-03-10 22:02:15 +01:00
Karol Damaszke	8e14780bf4	Wait 2sec once shard is ready to improve stability (#92 ) (#94 ) Co-authored-by: mswiniarsk <156412439+mswiniarsk@users.noreply.github.com>	2024-03-04 12:17:24 +01:00
Karol Damaszke	80ae9ead28	Set MAX_TOTAL_TOKENS automatically (#91 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-03-01 11:25:15 +01:00
Karol Damaszke	a5c788cfe4	Remove redundant fill op (#83 ) (#90 ) Co-authored-by: mswiniarsk <156412439+mswiniarsk@users.noreply.github.com>	2024-03-01 01:32:02 +01:00
Karol Damaszke	03c2123244	Use batched index_copy (#73 ) (#89 ) Co-authored-by: madamczykhabana <110973826+madamczykhabana@users.noreply.github.com>	2024-02-29 15:45:16 +01:00
Karol Damaszke	8f6564ce0e	Heap based router queue (#63 ) (#88 ) Co-authored-by: mrs303 <54661797+mrs303@users.noreply.github.com>	2024-02-29 10:56:26 +01:00
Karol Damaszke	7dbf4bf7a4	Improve tensor slicing performance (#66 ) (#87 ) Co-authored-by: mswiniarsk <156412439+mswiniarsk@users.noreply.github.com>	2024-02-29 10:48:54 +01:00
Karol Damaszke	3831f1bed5	Add warmup for shift operation (#59 ) (#86 )	2024-02-29 09:19:28 +01:00
Karol Damaszke	022ce1eaaf	Overhead reduction (#58 ) (#85 ) Co-authored-by: mrs303 <54661797+mrs303@users.noreply.github.com>	2024-02-29 09:17:45 +01:00
Karol Damaszke	212136dff8	Log exceptions to debug.log (#52 ) (#84 ) Co-authored-by: madamczykhabana <110973826+madamczykhabana@users.noreply.github.com>	2024-02-29 09:14:42 +01:00
Karol Damaszke	c7ccfb87ff	Grouped pad/shift/move operations (#57 ) (#82 ) Co-authored-by: madamczykhabana <110973826+madamczykhabana@users.noreply.github.com>	2024-02-29 04:16:44 +01:00
Karol Damaszke	2122acc60f	Add warmup for all possible shapes for prefill #49 (#81 )	2024-02-28 10:40:13 +01:00
Karol Damaszke	31bed905d4	Update habana profiler (#50 ) (#80 ) Co-authored-by: mswiniarsk <156412439+mswiniarsk@users.noreply.github.com>	2024-02-28 09:57:40 +01:00
Karol Damaszke	d31fb62576	Add more info to high-level profiler events (#46 ) (#79 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-02-28 09:55:50 +01:00
Karol Damaszke	941d36f3fd	Enable deferred token generation (#44 ) (#75 ) Co-authored-by: Krzysztof Laskowski <klaskowski@habana.ai>	2024-02-27 15:46:40 +01:00
Karol Damaszke	6248c5610e	Revert "Prefer prefill instead of decode when max_waiting_tokens==0 (#18 )" (#45 ) (#76 ) Co-authored-by: mswiniarsk <156412439+mswiniarsk@users.noreply.github.com>	2024-02-27 11:56:45 +01:00
jkaniecki	83b059bd27	Bulk shifting (#40 ) (#70 ) Co-authored-by: madamczykhabana <110973826+madamczykhabana@users.noreply.github.com>	2024-02-26 17:29:56 +01:00
regisss	8f4aba6ad3	Update dependencies (#69 )	2024-02-25 13:07:47 +01:00

... 7 8 9 10 11 ...

968 Commits