Addressed Omar's comments

This commit is contained in:
Merve Noyan 2023-08-23 16:01:55 +03:00 committed by GitHub
parent 0af0315b78
commit d7a0c348b6
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -1,6 +1,6 @@
# Tensor Parallelism
Tensor parallelism is a technique used to fit a large model in multiple GPUs. Intermediate outputs between GPUs are sent and received from one GPU to another in a synchronous or asynchronous manner. For example, when multiplying the input tensors with the first weights tensor, multiplying both tensors is equivalent to splitting the weight tensor column-wise, multiplying each column with input separately, and then concatenating the separate outputs like below 👇
Tensor parallelism is a technique used to fit a large model in multiple GPUs. For example, when multiplying the input tensors with the first weight tensor, multiplying both tensors is equivalent to splitting the weight tensor column-wise, multiplying each column with input separately, and then concatenating the separate outputs. These outputs are then sent between GPUs and then concatenated together to get the final result, like below 👇
![Image courtesy of Anton Lozkhov](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/TP.png)