mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-09-10 20:04:52 +00:00
Update docs/source/conceptual/tensor_parallelism.md
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
This commit is contained in:
parent
1e828f33c0
commit
0af0315b78
@ -4,7 +4,7 @@ Tensor parallelism is a technique used to fit a large model in multiple GPUs. I
|
||||
|
||||

|
||||
|
||||
In TGI, tensor parallelism is implemented under the hood by sharding weights and placing them in different ranks. The matrix multiplications then take place in different ranks and are then gathered into a single tensor.
|
||||
In TGI, tensor parallelism is implemented under the hood by sharding weights and placing them in different GPUs. The matrix multiplications then take place in different GPUs and are then gathered into a single tensor.
|
||||
|
||||
<Tip warning={true}>
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user