mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-09-11 12:24:53 +00:00
Link fixup.
This commit is contained in:
parent
f7f19aa4aa
commit
8066b868fe
@ -72,7 +72,7 @@ Long: `MODEL_ID=$MODEL_ID HOST=localhost:8000 k6 run load_tests/long.js`
|
|||||||
|
|
||||||
### Results
|
### Results
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
Our benchmarking results show significant performance gains, with a 13x speedup over vLLM with prefix caching, and up to 30x speedup without prefix caching. These results are consistent with our production data and demonstrate the effectiveness of our optimized LLM architecture.
|
Our benchmarking results show significant performance gains, with a 13x speedup over vLLM with prefix caching, and up to 30x speedup without prefix caching. These results are consistent with our production data and demonstrate the effectiveness of our optimized LLM architecture.
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user