text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-04-25 03:12:13 +00:00

Author	SHA1	Message	Date
baptiste	e2dba5c0ad	feat(upstream): add depreaction message for the tgi-gaudi fork due to upstream of gaudi	2025-03-10 10:43:09 +00:00
Yuan Wu	aba419a0cc	Fix crash issue of llava-next fp8 (#286 ) Signed-off-by: yuanwu <yuan.wu@intel.com>	2025-03-07 10:31:58 +01:00
Yuan Wu	cd57fea11b	Fix Llava next crash issue (#285 ) Signed-off-by: yuanwu <yuan.wu@intel.com>	2025-03-06 10:12:21 +01:00
Yuan Wu	20ea73c6d4	Fix mistralai/Mistral-7B-Instruct failed issue (#284 ) Signed-off-by: yuanwu <yuan.wu@intel.com>	2025-03-05 17:01:23 +01:00
Yuan Wu	c35810d6f0	Fix the loading issue of 90B (#283 ) Signed-off-by: yuanwu <yuan.wu@intel.com>	2025-02-28 11:20:55 +01:00
Yuan Wu	1d3a4ab851	Enable mllama (#272 ) Signed-off-by: Yuan Wu <yuan.wu@intel.com>	2025-02-27 16:12:15 +01:00
Tomasz Thaddey	17f0d57581	Unpin rustrc version and set it to 'stable' (#269 )	2025-02-13 10:49:09 +01:00
kaixuanliu	b52164d38a	Complete padding of `CausalLMBatch` when there exists batch bucketing (#261 ) Signed-off-by: kaixuanliu <kaixuan.liu@intel.com>	2025-01-30 10:19:13 +01:00
Yuan Wu	fe7594e369	Fix the warmup issue of prefill batch_size (#268 ) Signed-off-by: yuanwu <yuan.wu@intel.com>	2025-01-23 17:26:17 +01:00
Yuan Wu	63c64bb307	Use the default value in globals.py (#265 ) Signed-off-by: yuanwu <yuan.wu@intel.com>	2025-01-21 10:10:23 +01:00
Karol Damaszke	8de110ae9f	Fix warmup with SKIP_TOKENIZER_IN_TGI=true (#266 )	2025-01-21 10:09:49 +01:00
Yuan Wu	7d106477d6	Fix router input validation for SKIP_TOKENIZER_IN_TGI=true (#267 ) Signed-off-by: yuanwu <yuan.wu@intel.com>	2025-01-21 10:08:53 +01:00
Yuan Wu	6d6acca5eb	Update the ReadME for 2.3.1 (#260 ) Signed-off-by: yuanwu <yuan.wu@intel.com>	2025-01-03 10:55:14 +01:00
Yuan Wu	46b556805b	Upgrade to SynapseAI 1.19 (#259 ) Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-12-26 17:33:24 +01:00
regisss	5291f652a1	Merge pull request #225 from yuanwu2017/2.3.0	2024-12-19 11:42:59 -06:00
yuanwu	8e2e5d8e15	Fix benchmark build error Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-12-17 05:38:10 +00:00
yuanwu	eaeef6e7a4	Remove the useless modifications Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-12-17 02:08:12 +00:00
yuanwu	15de6c9195	Merge branch 'habana-main' into 2.3.0	2024-12-17 02:06:22 +00:00
Sun Choi	61309b2832	Remove the default max_tokens for /v1/chat/completions (#251 )	2024-12-16 09:32:57 +01:00
Sun Choi	cc2ca4ac22	HF_TOKEN replaces HUGGING_FACE_HUB_TOKEN as it is deprecated (#253 )	2024-12-15 09:59:58 +01:00
yuanwu	c3b8899f10	Revert "Use optimum-habana v1.15-release branch" This reverts commit `c6f023a06b`.	2024-12-11 08:17:17 +00:00
yuanwu	c922ef9534	Fix the warmup issue of llama2-7B. Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-12-09 07:20:48 +00:00
yuanwu	c6f023a06b	Use optimum-habana v1.15-release branch Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-12-08 13:02:31 +00:00
yuanwu	1b659788b5	Add the no-deps in pip install Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-12-08 12:14:38 +00:00
yuanwu	73e6e3b871	Remove the error log Subsequent updates will remove these codes Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-12-08 11:55:13 +00:00
yuanwu	9f356ce045	Refine the warmup process Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-12-07 09:56:16 +00:00
yuanwu	253a992447	Remove the CI workflows we don't currently support Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-12-02 08:45:36 +00:00
yuanwu	0228bd0260	Doesn't run the prefill warmup when limit_hpu_graph=true Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-12-01 21:29:41 +00:00
yuanwu	4586325a34	Fix the starCode warmup issue Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-12-01 06:14:00 +00:00
Yuan Wu	b83419a769	Merge branch 'habana-main' into 2.3.0	2024-11-28 12:38:36 +08:00
yuanwu	636cdb4c43	Fix startcode issue Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-11-26 08:55:42 +00:00
srajabos	d49ce00f40	With this change, bucketing/padding of input is applied to health check. (#245 )	2024-11-18 22:38:30 +01:00
yuanwu2017	56c3eb4adb	Remove the torch package in requirements.txt (#246 ) Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-11-07 09:22:24 -08:00
yuanwu2017	c345c734a7	Merge branch 'habana-main' into 2.3.0	2024-11-01 11:24:40 +08:00
yuanwu	fcf2e3a338	Fix the prefill warmup issue Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-11-01 05:08:52 +02:00
Thanaji Rao Thakkalapelli	6ba3d1d6e5	updated release docker image version in readme to 2.0.6 (#242 )	2024-10-31 15:44:16 -07:00
yuanwu2017	8d84ffabf2	Upgrade to SynapseAI 1.18 (#227 ) Signed-off-by: yuanwu <yuan.wu@intel.com> Co-authored-by: Thanaji Rao Thakkalapelli <tthakkalapelli@habana.ai>	2024-10-31 20:14:44 +01:00
Thanaji Rao Thakkalapelli	7fb4af9a87	updated supported models list table in readme (#241 ) * updated supported models list table in readme * updated read me * updated read me	2024-10-29 23:28:45 -07:00
yuanwu	4c9856f9e5	Add missing package Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-10-28 07:04:56 +00:00
yuanwu2017	c23584f626	Merge branch 'habana-main' into 2.3.0	2024-10-28 04:37:07 +08:00
yuanwu	372e071135	Fix the issues of tgi-gaudi for v.2.3.1 Signed-off-by: yuanwu <yuan.wu@intel.com>	2024-10-27 20:40:36 +00:00
Nicolas Patry	7e282b4153	V2.3.1	2024-10-27 04:14:35 +00:00
Nicolas Patry	34e98b14ef	New release 2.3.1 (#2604 ) * New release 2.3.1 * Update doc number	2024-10-27 04:14:35 +00:00
drbh	902f526d69	Unroll notify error into generate response (#2597 ) * feat: unroll notify_error if no tool is choosen * fix: expect simple message when no tool is selected * fix: improve test to avoid notify_error * fix: improve docs and indicate change in expected response * fix: adjust linting in test file	2024-10-27 04:03:57 +00:00
drbh	7664d2e2b3	CI (2592): Allow LoRA adapter revision in server launcher (#2602 ) allow revision for lora adapters from launcher Co-authored-by: Sida <sida@kulamind.com> Co-authored-by: teamclouday <teamclouday@gmail.com>	2024-10-27 04:03:57 +00:00
Nicolas Patry	967e67111d	Max token capacity metric (#2595 ) * adding max_token_capacity_metric * added tgi to name of metric * Adding max capacity metric. * Add description for the metrics --------- Co-authored-by: Edwinhr716 <Edandres249@gmail.com>	2024-10-27 04:03:57 +00:00
Nicolas Patry	51506aa57a	Mllama flash version (#2585 ) * Working loading state. * Preprocessing. * Working state ? (Broke idefics1 temporarily). * Cleaner condition. * Fix idefics. * Updating config, removing TODO * Mllama * Ugrade transformers 4.45 * Flashing mllama. * Starting to get there. * Working state. * Integrations tests for mllama (cutting to 10 tokens because there seems' to be instability after (meaning size of the batch matters. * Updating model link. * Earlier assert. * Fix vlm ? * remove log. * Force ignore all images but last. * Default dtype bfloat16. * Update integration test after switch to bf16. * Remove dead code. * Removed dead code. * Upgrade the flake to latest transformers/tokenizers * Move to hf tgi-nix * Upgrade to 0.5.0	2024-10-27 04:03:57 +00:00
Daniël de Kok	fa964f82d3	nix: experimental support for building a Docker container (#2470 ) * nix: experimental support for building a Docker image Run using something like: ``` docker run \ --device nvidia.com/gpu=all \ -it --rm -p 8080:80 \ -v $PWD/data:/data \ -v $PWD/tmp:/tmp \ tgi-docker:latest \ --model-id <model_id> ``` * Example of building the Docker image using Nix inside Docker * Stream to make the builder image smaller This avoids storing a Docker image tarball in the image. Instead, stream the layers while doing `docker run`. * Don't spam journalctl on Linux * Other dockerfile. --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2024-10-25 09:12:03 +00:00
Daniël de Kok	775e5f4c64	MoE Marlin: support `desc_act` for `groupsize != -1` (#2590 ) This change uses the updated Marlin MoE kernel from vLLM to support MoE with activation sorting and groups.	2024-10-25 09:12:03 +00:00
Daniël de Kok	692f8ddb69	Move flake back to tgi-nix `main` (#2586 )	2024-10-25 09:12:03 +00:00

1 2 3 4 5 ...

1237 Commits