diff --git a/README.md b/README.md
index 3b71f602..a1a1d867 100644
--- a/README.md
+++ b/README.md
@@ -7,6 +7,7 @@ TGI is well suited for distributed/ cloud burst/ on-demand workloads, yet HF's f
 
 
 ## Goals
+- ☑️ loads LLama2 in 4bit on a Pascal GPU (1080, Llama 2 7B)
 - Support Model loading from wherever you want (HDFS, S3, HTTPS, …)
 - Support Adapters (LORA/PEFT) without merging (possibly huge) Checkpoints and uploading them to 🤗
 - Support last Gen GPUS (back to Pascal hopefully)
@@ -16,6 +17,16 @@ TGI is well suited for distributed/ cloud burst/ on-demand workloads, yet HF's f
 
 `</endOfMissionStatement>`
 
+
+# 🦙 LLama 2 in 4bit
+
+To use Llama 2 7B on a 1080 (Pascal Gen, Compute capability 6.1):
+1) Install this repository via `make install`
+2) Modify `server/Makefile` section `run-dev` and change the `/mnt/TOFU/HF_MODELS/` path to a path where you have downloaded a HF model via `git lfs clone https://huggingface.co/[repo]/[model]`. E.g. the model will be loaded as `/data/models/Llama-2-7b-chat-hf`
+3) open two terminals
+4) terminal 1: `make router-dev` (starts the router that exposes the model at localhost:8080)
+5) terminal 2: `make server-dev` (starts the model server, loads the model to the GPU)
+6) test the model by calling it with CURL `curl localhost:8080/generate     -X POST     -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":90}}'     -H 'Content-Type: application/json'`
 <div align="center">
 
 ![image](https://github.com/huggingface/text-generation-inference/assets/3841370/38ba1531-ea0d-4851-b31a-a6d4ddc944b0)