text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-07-10 09:50:15 +00:00

Author	SHA1	Message	Date
OlivierDehaene	5ff9e81952	fix: fix offline (#1341 ) (#1347 ) @oOraph --------- Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com> Co-authored-by: Raphael Glon <oOraph@users.noreply.github.com>	2024-04-19 14:56:25 +03:00
OlivierDehaene	ecb0db45af	fix: fix logic if sliding window key is not present in config (#1352 )	2024-04-19 14:56:10 +03:00
OlivierDehaene	a95e6d603d	feat: relax mistral requirements (#1351 ) Close #1253 Close #1279	2024-04-19 14:50:24 +03:00
OlivierDehaene	3600fc9dbe	v1.3.3	2024-04-19 14:18:39 +03:00
OlivierDehaene	bb6200503c	fix: max_past default value must be -1, not 0 (#1348 )	2024-04-19 14:18:05 +03:00
OlivierDehaene	214ec0eb49	fix: only keep stop sequence buffer if we have some	2024-04-19 14:18:00 +03:00
OlivierDehaene	04dbf7a506	fix: slice stopping criteria buffer	2024-04-19 14:17:52 +03:00
OlivierDehaene	b3c2d7291e	fix: fix quant linear autotune	2024-04-19 14:17:39 +03:00
OlivierDehaene	28fcdcca6d	fix: fix triton OutOfResources import	2024-04-19 14:17:32 +03:00
OlivierDehaene	5c9ef069ed	feat: add more latency metrics in forward (#1346 )	2024-04-19 13:41:34 +03:00
OlivierDehaene	c974437ba7	fix: fix gpt-q params loading	2024-04-19 12:12:50 +03:00
OlivierDehaene	2f88d8dfb3	fix: default max_new_tokens to 100	2024-04-19 12:09:05 +03:00
OlivierDehaene	05f8c85a8b	v1.3.2	2024-04-18 16:33:05 +03:00
OlivierDehaene	f9b58ac7a1	feat: add quant to mixtral (#1337 )	2024-04-18 16:32:50 +03:00
OlivierDehaene	09c556dbd7	v1.3.1	2024-04-18 16:32:07 +03:00
OlivierDehaene	db5053fc86	v1.3.0	2024-04-18 16:31:53 +03:00
OlivierDehaene	79f268f95a	chore: formatting	2024-04-18 16:26:00 +03:00
OlivierDehaene	9aef902982	feat: mixtral (#1328 )	2024-04-18 12:39:52 +00:00
Nicolas Patry	a7f52f3812	Speculative (#1308 )	2024-04-18 12:39:39 +00:00
Nicolas Patry	a41c1a6bc7	Add a stale bot. (#1313 )	2024-04-18 10:10:02 +03:00
fxmarty	ab34c16610	Fix AMD documentation (#1307 ) As per title	2024-04-18 10:09:36 +03:00
Jacek Czaja	ae6215fcea	Enable server UT: test_causal_lm.py::test_batch_from_pb (#121 ) Co-authored-by: Jacek Czaja <jczaja@habana.ai>	2024-04-10 16:33:56 +02:00
Karol Damaszke	30cc78773e	Skip server tests of not enabled models (#125 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-04-09 14:15:41 +02:00
Karol Damaszke	c6739526c6	Fix test_watermark (#124 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-04-09 11:29:21 +02:00
Sylwester Fraczek	757c12dbac	Fix test_pass_through_tokenizer (#117 ) Co-authored-by: Sylwester Fraczek <sfraczek@habana.ai>	2024-04-09 09:30:47 +02:00
Karol Damaszke	d957e32601	Add Habana copyright header (#122 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-04-08 18:06:21 +02:00
Karol Damaszke	06227f7b5e	Fix router tests (#119 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-04-04 11:10:11 +02:00
Karol Damaszke	e210e15e27	Update Cargo.lock file (#118 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-04-03 17:55:54 +02:00
Karol Damaszke	b0de25a285	Don't set rope_scaling for unsupported models (#115 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-04-02 12:12:02 +02:00
yuanwu2017	3e28d7aa42	Align the default value with server's (#111 ) Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-04-01 12:44:20 +02:00
Karol Damaszke	7342baa2eb	Add support for rope_scaling and remove is_optimized_for_gaudi (#112 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-03-29 15:07:32 +01:00
Karol Damaszke	bf5263b88b	Disable watermark with FP8 quantization (#114 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-03-27 13:32:20 +01:00
jkaniecki	56f00a552b	Adjust warmup to all possible bucket sizes and decode batch size = 1 (#113 )	2024-03-27 11:59:51 +01:00
Karol Damaszke	9796b0e10d	Add simple continuous batching benchmark (#108 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai> Co-authored-by: regisss <15324346+regisss@users.noreply.github.com>	2024-03-26 09:17:55 +01:00
regisss	7f58680999	Add `docker pull` command in README (#110 )	2024-03-25 15:44:54 +01:00
jkaniecki	2b1581edac	Warmup greedy search in next token chooser (#109 )	2024-03-22 23:43:20 +01:00
Wang, Yi	d752317b5f	Correct input_length since habana extend input_length to max_input_length (#103 ) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2024-03-18 15:23:13 +01:00
Karol Damaszke	b45f648483	Add warmup for logits processors (#107 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-03-18 15:17:47 +01:00
jkaniecki	8504f9c41c	Improve README clarity (#106 )	2024-03-18 15:15:07 +01:00
yuanwu2017	a4d5c3f40f	Fix the generate_stream crash in concurrent query (#105 ) Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-03-15 10:54:56 +01:00
Wang, Yi	3d81a80577	Fix incorrect setting of max_new_tokens in warmup (#104 ) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2024-03-13 16:19:40 +01:00
Yao Matrix	7149ac30e6	Fix the issue of out of range (#98 ) Signed-off-by: yuanwu <yuan.wu@intel.com> Co-authored-by: yuanwu <yuan.wu@intel.com>	2024-03-13 10:09:53 +01:00
jkaniecki	602a920ec5	Update nix version (#102 )	2024-03-11 16:21:04 +01:00
Karol Damaszke	365f277900	Clean-up README (#96 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-03-10 22:02:15 +01:00
Karol Damaszke	8e14780bf4	Wait 2sec once shard is ready to improve stability (#92 ) (#94 ) Co-authored-by: mswiniarsk <156412439+mswiniarsk@users.noreply.github.com>	2024-03-04 12:17:24 +01:00
Karol Damaszke	80ae9ead28	Set MAX_TOTAL_TOKENS automatically (#91 ) Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>	2024-03-01 11:25:15 +01:00
Karol Damaszke	a5c788cfe4	Remove redundant fill op (#83 ) (#90 ) Co-authored-by: mswiniarsk <156412439+mswiniarsk@users.noreply.github.com>	2024-03-01 01:32:02 +01:00
Karol Damaszke	03c2123244	Use batched index_copy (#73 ) (#89 ) Co-authored-by: madamczykhabana <110973826+madamczykhabana@users.noreply.github.com>	2024-02-29 15:45:16 +01:00
Karol Damaszke	8f6564ce0e	Heap based router queue (#63 ) (#88 ) Co-authored-by: mrs303 <54661797+mrs303@users.noreply.github.com>	2024-02-29 10:56:26 +01:00
Karol Damaszke	7dbf4bf7a4	Improve tensor slicing performance (#66 ) (#87 ) Co-authored-by: mswiniarsk <156412439+mswiniarsk@users.noreply.github.com>	2024-02-29 10:48:54 +01:00

... 12 13 14 15 16 ...

1229 Commits