Update docs/source/conceptual/tensor_parallelism.md

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
This commit is contained in:
Merve Noyan 2023-08-23 15:45:24 +03:00 committed by GitHub
parent 1e828f33c0
commit 0af0315b78
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -4,7 +4,7 @@ Tensor parallelism is a technique used to fit a large model in multiple GPUs. I
![Image courtesy of Anton Lozkhov](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/TP.png)
In TGI, tensor parallelism is implemented under the hood by sharding weights and placing them in different ranks. The matrix multiplications then take place in different ranks and are then gathered into a single tensor.
In TGI, tensor parallelism is implemented under the hood by sharding weights and placing them in different GPUs. The matrix multiplications then take place in different GPUs and are then gathered into a single tensor.
<Tip warning={true}>