From 33d9bae61252f10865746d2acbd20840ab964843 Mon Sep 17 00:00:00 2001 From: Merve Noyan Date: Thu, 24 Aug 2023 12:46:27 +0300 Subject: [PATCH] Update tensor_parallelism.md --- docs/source/conceptual/tensor_parallelism.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/source/conceptual/tensor_parallelism.md b/docs/source/conceptual/tensor_parallelism.md index 13dcbd61..f428f00f 100644 --- a/docs/source/conceptual/tensor_parallelism.md +++ b/docs/source/conceptual/tensor_parallelism.md @@ -4,11 +4,10 @@ Tensor parallelism is a technique used to fit a large model in multiple GPUs. Fo ![Image courtesy of Anton Lozkhov](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/tgi/TP.png) -In TGI, tensor parallelism is implemented under the hood by sharding weights and placing them in different GPUs. The matrix multiplications then take place in different GPUs and are then gathered into a single tensor. -Tensor Parallelism only works for models officially supported, it will not work when falling back on `transformers`. +Tensor Parallelism only works for models officially supported, it will not work when falling back on `transformers`. You can get more information about unsupported models [here](./basic_tutorials/non_core_models.md).