mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-09-11 04:14:52 +00:00
Addressed Omar's comments
This commit is contained in:
parent
0af0315b78
commit
d7a0c348b6
@ -1,6 +1,6 @@
|
|||||||
# Tensor Parallelism
|
# Tensor Parallelism
|
||||||
|
|
||||||
Tensor parallelism is a technique used to fit a large model in multiple GPUs. Intermediate outputs between GPUs are sent and received from one GPU to another in a synchronous or asynchronous manner. For example, when multiplying the input tensors with the first weights tensor, multiplying both tensors is equivalent to splitting the weight tensor column-wise, multiplying each column with input separately, and then concatenating the separate outputs like below 👇
|
Tensor parallelism is a technique used to fit a large model in multiple GPUs. For example, when multiplying the input tensors with the first weight tensor, multiplying both tensors is equivalent to splitting the weight tensor column-wise, multiplying each column with input separately, and then concatenating the separate outputs. These outputs are then sent between GPUs and then concatenated together to get the final result, like below 👇
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user