From 099291a061e33a1f093bb492c65fdba4f8e4734d Mon Sep 17 00:00:00 2001 From: Merve Noyan Date: Thu, 7 Sep 2023 14:53:30 +0200 Subject: [PATCH] Update docs/source/conceptual/tensor_parallelism.md Co-authored-by: Pedro Cuenca --- docs/source/conceptual/tensor_parallelism.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/conceptual/tensor_parallelism.md b/docs/source/conceptual/tensor_parallelism.md index f428f00f..111f842c 100644 --- a/docs/source/conceptual/tensor_parallelism.md +++ b/docs/source/conceptual/tensor_parallelism.md @@ -1,6 +1,6 @@ # Tensor Parallelism -Tensor parallelism is a technique used to fit a large model in multiple GPUs. For example, when multiplying the input tensors with the first weight tensor, multiplying both tensors is equivalent to splitting the weight tensor column-wise, multiplying each column with input separately, and then concatenating the separate outputs. These outputs are then sent between GPUs and then concatenated together to get the final result, like below 👇 +Tensor parallelism is a technique used to fit a large model in multiple GPUs. For example, when multiplying the input tensors with the first weight tensor, the matrix multiplication is equivalent to splitting the weight tensor column-wise, multiplying each column with the input separately, and then concatenating the separate outputs. These outputs are then transferred from the GPUs and concatenated together to get the final result, like below 👇 ![Image courtesy of Anton Lozkhov](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/TP.png)