From 7fb4af9a877006495678548cd7662c8c14e29c74 Mon Sep 17 00:00:00 2001
From: Thanaji Rao Thakkalapelli <tthakkalapelli@habana.ai>
Date: Tue, 29 Oct 2024 23:28:45 -0700
Subject: [PATCH] updated supported models list table in readme (#241)

* updated supported models list table in readme

* updated read me

* updated read me
---
 README.md | 76 ++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 45 insertions(+), 31 deletions(-)

diff --git a/README.md b/README.md
index 2fa8836b..7f418d17 100644
--- a/README.md
+++ b/README.md
@@ -20,13 +20,42 @@ limitations under the License.
 
 - [Text Generation Inference on Habana Gaudi](#text-generation-inference-on-habana-gaudi)
   - [Table of contents](#table-of-contents)
+  - [Tested Models and Configurations](#tested-models-and-configurations)
   - [Running TGI on Gaudi](#running-tgi-on-gaudi)
   - [Running TGI with BF16 Precision](#running-tgi-with-bf16-precision)
   - [Running TGI with FP8 Precision](#running-tgi-with-fp8-precision)
+  - [TGI-Gaudi Benchmark](#tgi-gaudi-benchmark)
   - [Adjusting TGI Parameters](#adjusting-tgi-parameters)
-  - [Environment variables](#environment-variables)
+  - [Environment Variables](#environment-variables)
   - [Profiler](#profiler)
 
+
+## Tested Models and Configurations
+
+The following table contains models and configurations we have validated on Gaudi2.
+
+
+|  Model                 |  BF16        |             |  FP8         |             |
+| ---------------------- | ------------ | ----------- | ------------ | ----------- |
+|                        |  Single Card |  Multi-Card |  Single Card |  Multi-Card |
+|  Llama2-7B             |  ✔           |  ✔          |  ✔           |  ✔          |
+|  Llama2-70B            |              |  ✔          |              |  ✔          |
+|  Llama3-8B             |  ✔           |  ✔          |  ✔           |  ✔          |
+|  Llama3-70B            |              |  ✔          |              |  ✔          |
+|  Llama3.1-8B           |  ✔           |  ✔          |  ✔           |  ✔          |
+|  Llama3.1-70B          |              |  ✔          |              |  ✔          |
+|  CodeLlama-13B         |  ✔           |  ✔          |  ✔           |  ✔          |
+|  Mixtral-8x7B          |  ✔           |  ✔          |  ✔           |  ✔          |
+|  Mistral-7B            |  ✔           |  ✔          |  ✔           |  ✔          |
+|  Falcon-180B           |              |  ✔          |              |  ✔          |
+|  Qwen2-72B             |              |  ✔          |              |  ✔          |
+|  Starcoder2-3b         |  ✔           |  ✔          |  ✔           |             |
+|  Starcoder2-15b        |  ✔           |  ✔          |  ✔           |             |
+|  Starcoder             |  ✔           |  ✔          |  ✔           |  ✔          |
+|  Gemma-7b              |  ✔           |  ✔          |  ✔           |  ✔          |
+|  Llava-v1.6-Mistral-7B |  ✔           |  ✔          |  ✔           |  ✔          |
+
+
 ## Running TGI on Gaudi
 
 To use [🤗 text-generation-inference](https://github.com/huggingface/text-generation-inference) on Habana Gaudi/Gaudi2/Gaudi3, follow these steps:
@@ -82,36 +111,6 @@ To use [🤗 text-generation-inference](https://github.com/huggingface/text-gene
    ```
 4. Please note that the model warmup can take several minutes, especially for FP8 inference. To minimize this time in consecutive runs, please refer to [Disk Caching Eviction Policy](https://docs.habana.ai/en/latest/PyTorch/Model_Optimization_PyTorch/Optimization_in_PyTorch_Models.html#disk-caching-eviction-policy).
 
-### TGI-Gaudi Benchmark
-
-#### Static Batching Benchmark
- To run static batching benchmark, please refer to [TGI's benchmark tool](https://github.com/huggingface/text-generation-inference/tree/main/benchmark).
-
-   To run it on the same machine, you can do the following:
-   * `docker exec -it <docker name> bash` , pick the docker started from step 2 using docker ps
-   * `text-generation-benchmark -t <model-id>` , pass the model-id from docker run command
-   * after the completion of tests, hit ctrl+c to see the performance data summary.
-
-#### Continuous Batching Benchmark
- To run continuous batching benchmark, please refer to [README in examples folder](https://github.com/huggingface/tgi-gaudi/blob/habana-main/examples/README.md).
-
-### Tested Models and Configurations
-
-The following table contains models and configurations we have validated on Gaudi2.
-
-| Model                 | BF16 | FP8 | Single Card | Multi-Cards |
-|-----------------------|------|-----|-------------|-------------|
-| Llama2-7B             | ✔    | ✔   | ✔           | ✔           |
-| Llama2-70B            | ✔    | ✔   |             | ✔           |
-| Llama3-8B             | ✔    | ✔   | ✔           | ✔           |
-| Llama3-70B            | ✔    | ✔   |             | ✔           |
-| Llama3.1-8B           | ✔    | ✔   | ✔           | ✔           |
-| Llama3.1-70B          | ✔    | ✔   |             | ✔           |
-| CodeLlama-13B         | ✔    | ✔   | ✔           |             |
-| Mixtral-8x7B          | ✔    | ✔   | ✔           | ✔           |
-| Mistral-7B            | ✔    | ✔   | ✔           | ✔           |
-| Llava-v1.6-Mistral-7B | ✔    | ✔   | ✔           | ✔           |
-
 
 ## Running TGI with BF16 Precision
 
@@ -497,6 +496,21 @@ docker run -p 8080:80 \
    --max-total-tokens 8192 --max-batch-total-tokens 32768
 ```
 
+## TGI-Gaudi Benchmark
+
+### Static Batching Benchmark
+ To run static batching benchmark, please refer to [TGI's benchmark tool](https://github.com/huggingface/text-generation-inference/tree/main/benchmark).
+
+   To run it on the same machine, you can do the following:
+   * `docker exec -it <docker name> bash` , pick the docker started from step 2 using docker ps
+   * `text-generation-benchmark -t <model-id>` , pass the model-id from docker run command
+   * after the completion of tests, hit ctrl+c to see the performance data summary.
+> Note: This benchmark runs the model with bs=[1, 2, 4, 8, 16, 32], sequence_length=10 and decode_length=8 by default. if you want to run other configs, please check text-generation-benchmark -h and change the parameters.
+
+### Continuous Batching Benchmark
+ To run continuous batching benchmark, please refer to [README in examples folder](https://github.com/huggingface/tgi-gaudi/blob/habana-main/examples/README.md).
+
+
 ## Adjusting TGI Parameters
 
 Maximum sequence length is controlled by two arguments: