From 8066b868fe5b2cd227a131851b844df74948d42f Mon Sep 17 00:00:00 2001 From: Nicolas Patry Date: Tue, 10 Dec 2024 00:37:31 +0530 Subject: [PATCH] Link fixup. --- docs/source/conceptual/chunking.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/conceptual/chunking.md b/docs/source/conceptual/chunking.md index 110d60ab..f6489afd 100644 --- a/docs/source/conceptual/chunking.md +++ b/docs/source/conceptual/chunking.md @@ -72,7 +72,7 @@ Long: `MODEL_ID=$MODEL_ID HOST=localhost:8000 k6 run load_tests/long.js` ### Results -![benchmarks_v3](https://github.com/huggingface/text-generation-inference/blob/main/assets/benchmarks_v3.png) +![benchmarks_v3](https://github.com/huggingface/text-generation-inference/blob/main/assets/v3_benchmarks.png) Our benchmarking results show significant performance gains, with a 13x speedup over vLLM with prefix caching, and up to 30x speedup without prefix caching. These results are consistent with our production data and demonstrate the effectiveness of our optimized LLM architecture.