From d7a0c348b67e4b2c3f41506ca0ff324d1a201db5 Mon Sep 17 00:00:00 2001 From: Merve Noyan Date: Wed, 23 Aug 2023 16:01:55 +0300 Subject: [PATCH] Addressed Omar's comments --- docs/source/conceptual/tensor_parallelism.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/conceptual/tensor_parallelism.md b/docs/source/conceptual/tensor_parallelism.md index 278c0832..9ffcb913 100644 --- a/docs/source/conceptual/tensor_parallelism.md +++ b/docs/source/conceptual/tensor_parallelism.md @@ -1,6 +1,6 @@ # Tensor Parallelism -Tensor parallelism is a technique used to fit a large model in multiple GPUs. Intermediate outputs between GPUs are sent and received from one GPU to another in a synchronous or asynchronous manner. For example, when multiplying the input tensors with the first weights tensor, multiplying both tensors is equivalent to splitting the weight tensor column-wise, multiplying each column with input separately, and then concatenating the separate outputs like below 👇 +Tensor parallelism is a technique used to fit a large model in multiple GPUs. For example, when multiplying the input tensors with the first weight tensor, multiplying both tensors is equivalent to splitting the weight tensor column-wise, multiplying each column with input separately, and then concatenating the separate outputs. These outputs are then sent between GPUs and then concatenated together to get the final result, like below 👇 ![Image courtesy of Anton Lozkhov](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/TP.png)