From b2fac5d9477d99ee5ee6518f8138f341730d237b Mon Sep 17 00:00:00 2001 From: Nicolas Patry Date: Tue, 10 Dec 2024 01:27:18 +0530 Subject: [PATCH] Hotfix link2 (#2812) 2nd hotfix ? --- docs/source/conceptual/chunking.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/conceptual/chunking.md b/docs/source/conceptual/chunking.md index 9c4cbcdd..5b76fac7 100644 --- a/docs/source/conceptual/chunking.md +++ b/docs/source/conceptual/chunking.md @@ -72,7 +72,7 @@ Long: `MODEL_ID=$MODEL_ID HOST=localhost:8000 k6 run load_tests/long.js` ### Results -![benchmarks_v3](https://github.com/huggingface/text-generation-inference/blob/042791fbd5742b1644d42c493db6bec669df6537/assets/v3_benchmarks.png) +![benchmarks_v3](https://raw.githubusercontent.com/huggingface/text-generation-inference/refs/heads/main/assets/v3_benchmarks.png) Our benchmarking results show significant performance gains, with a 13x speedup over vLLM with prefix caching, and up to 30x speedup without prefix caching. These results are consistent with our production data and demonstrate the effectiveness of our optimized LLM architecture.