mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-09-11 04:14:52 +00:00
Update tensor_parallelism.md
This commit is contained in:
parent
60e4ee2f11
commit
33d9bae612
@ -4,11 +4,10 @@ Tensor parallelism is a technique used to fit a large model in multiple GPUs. Fo
|
|||||||
|
|
||||||

|

|
||||||
|
|
||||||
In TGI, tensor parallelism is implemented under the hood by sharding weights and placing them in different GPUs. The matrix multiplications then take place in different GPUs and are then gathered into a single tensor.
|
|
||||||
|
|
||||||
<Tip warning={true}>
|
<Tip warning={true}>
|
||||||
|
|
||||||
Tensor Parallelism only works for models officially supported, it will not work when falling back on `transformers`.
|
Tensor Parallelism only works for models officially supported, it will not work when falling back on `transformers`. You can get more information about unsupported models [here](./basic_tutorials/non_core_models.md).
|
||||||
|
|
||||||
</Tip>
|
</Tip>
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user