Update tensor_parallelism.md

This commit is contained in:
Merve Noyan 2023-08-24 12:46:27 +03:00 committed by GitHub
parent 60e4ee2f11
commit 33d9bae612
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -4,11 +4,10 @@ Tensor parallelism is a technique used to fit a large model in multiple GPUs. Fo
![Image courtesy of Anton Lozkhov](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/TP.png) ![Image courtesy of Anton Lozkhov](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/TP.png)
In TGI, tensor parallelism is implemented under the hood by sharding weights and placing them in different GPUs. The matrix multiplications then take place in different GPUs and are then gathered into a single tensor.
<Tip warning={true}> <Tip warning={true}>
Tensor Parallelism only works for models officially supported, it will not work when falling back on `transformers`. Tensor Parallelism only works for models officially supported, it will not work when falling back on `transformers`. You can get more information about unsupported models [here](./basic_tutorials/non_core_models.md).
</Tip> </Tip>