Commit Graph

  • ccdec05f7e Move piece/position embeddings into FlashGPT2Model Daniël de Kok 2024-05-13 12:26:34 +0000
  • d348d2b28f
    Granite support? (#1882) Nicolas Patry 2024-05-13 13:46:29 +0200
  • dbc8a65a5d Old config support ? Nicolas Patry 2024-05-13 10:32:29 +0000
  • b7c26f0f46 Upgrade transformers to have mlp_bias in config. Nicolas Patry 2024-05-13 09:21:14 +0000
  • fff4899e57 Granite support? Nicolas Patry 2024-05-13 08:04:14 +0000
  • fd89d9dfae
    Refactor layers. (#1866) Nicolas Patry 2024-05-13 12:44:30 +0200
  • 1510461d93 Fix sharding (?) Daniël de Kok 2024-05-13 09:51:04 +0000
  • 4ce8b6f0ee Add GPT-2 with flash attention Daniël de Kok 2024-05-10 15:54:18 +0000
  • f4dac978d2 disable _custom_C for debug purpose fxmarty 2024-05-10 18:22:41 +0000
  • cd313364a0 add debug dockerfile fxmarty 2024-05-09 22:14:24 +0000
  • 4607b7e9c4 Readd tool_prompt Thomas SCHILLACI 2024-05-07 19:30:18 +0200
  • 2f644779cb Add stop parameter to completions route Thomas SCHILLACI 2024-05-07 19:27:05 +0200
  • 9fb1cdc8d5 Add completion route to client and stop parameter Thomas SCHILLACI 2024-05-07 18:27:28 +0200
  • 72f5282098 Fixed. Nicolas Patry 2024-05-07 13:19:49 +0000
  • cc07a84604 Fixes. Nicolas Patry 2024-05-07 12:11:43 +0000
  • 649f088519 Remove scaffolding. Nicolas Patry 2024-05-07 10:20:34 +0000
  • ddc0dd57f7 Fixes. Nicolas Patry 2024-05-07 10:08:50 +0000
  • fe4ef95d92 Protect cuda better ? Nicolas Patry 2024-05-06 20:32:08 +0200
  • 3494cb0067 Layernorm import. Nicolas Patry 2024-05-06 20:16:15 +0200
  • 361263afbf Support CPU again. Nicolas Patry 2024-05-06 20:00:56 +0200
  • c56df3b167 Too many removal. Nicolas Patry 2024-05-06 19:55:39 +0200
  • 34b9289c3d Moving to SYSTEM enum. Nicolas Patry 2024-05-06 19:16:04 +0200
  • c84718c8b6 Refactor layers. Nicolas Patry 2024-05-06 18:25:19 +0200
  • 59b3ffea14
    update xpu docker image and use public ipex whel (#1860) Wang, Yi 2024-05-06 22:05:43 +0800
  • fe16a465a0
    causal_lm server tests rebased (#139) Sylwester Fraczek 2024-05-06 15:55:35 +0200
  • ac7076b64d
    Upgrading to rust 1.78. (#1851) Nicolas Patry 2024-05-06 13:48:11 +0200
  • bad7fe720a
    Fix warmup shapes for corner cases (#136) Karol Damaszke 2024-05-06 11:35:27 +0200
  • 4169ff8e6f
    Add info about FP8 support (#137) Karol Damaszke 2024-05-06 11:03:14 +0200
  • f82da93318
    Fix input length validation (#135) Karol Damaszke 2024-05-06 09:55:58 +0200
  • 81182bed76
    Merge pull request #134 from kdamaszk/rebase_tgi_2.0 regisss 2024-05-06 09:28:16 +0200
  • db53113a18 update xpu docker image and use public ipex whel Wang, Yi A 2024-05-05 23:34:10 -0700
  • 0bbec634f9 Update README example commands Karol Damaszke 2024-05-06 09:26:01 +0300
  • 3d78027c90 A patch to address HPU Graphs issue with DILL Yaser Afshar 2024-04-23 12:57:39 -0700
  • 96e55f607e feat: add deprecation warning to clients David Holtz 2024-05-03 11:37:38 -0400
  • bb2b2959a2
    Add router name to /info endpoint (#1854) Lucain 2024-05-03 16:39:04 +0200
  • b9d8af694b added a bunch of cleanup based on comments in PR; removed conditionals from LayerNormParameterized and renamed to MLPSpeculatorLayerNorm; now using modules for tensor-parallel (this is work in progress - looking into if this is right approach); fixed issue with getting medusa model; fixed for more efficient loading Joshua Rosenkranz 2024-05-03 10:02:11 -0400
  • afb02914e6
    Add router name to /info endpoint Wauplin 2024-05-03 12:11:57 +0200
  • bcd193443f Clippy fixes. Nicolas Patry 2024-05-03 11:04:53 +0200
  • da6326ab83 Removed import. Nicolas Patry 2024-05-03 10:42:44 +0200
  • 64e65ba3a1 allow ROCM_USE_FLASH_ATTN_V2_TRITON=1 fxmarty 2024-05-03 07:36:33 +0000
  • ca5ea45181 add LLMM_Silu mistral Mohit Sharma 2024-05-03 03:37:48 +0000
  • cd62e237fe Removed unused struct. Nicolas Patry 2024-05-02 21:52:04 +0200
  • a25737139d
    Updating Phi3 (long context). (#1849) Nicolas Patry 2024-05-02 19:07:10 +0200
  • caf07decf0 ability to specify tunableop tuned lengths fxmarty 2024-05-02 16:05:55 +0000
  • 6c385626eb more cleaning fxmarty 2024-05-02 15:44:38 +0000
  • c70742654b cleanup dockerfile fxmarty 2024-05-02 15:39:43 +0000
  • 52f593bba7 remove unnecessary code fxmarty 2024-05-02 15:37:38 +0000
  • 1f37d57266 tunableop on 1,...,8 fxmarty 2024-05-02 15:36:19 +0000
  • 51b0c25f37 add model id fxmarty 2024-05-02 15:33:00 +0000
  • 568d6094b5 Upgrading to rust 1.78. Nicolas Patry 2024-05-02 17:07:24 +0200
  • 65539b743e
    feat: prefer huggingface_hub in docs and show image api (#1844) drbh 2024-05-02 10:56:24 -0400
  • 43a2a0ca5e initial commit of mlp_speculator support (draft) Joshua Rosenkranz 2024-05-02 10:18:42 -0400
  • 7fac2978b3 Updating Phi3 (long context). Nicolas Patry 2024-05-02 14:14:24 +0000
  • d2b4b02c0e Merge branch 'mi300-temp' into mi300-compat fxmarty 2024-05-02 15:34:10 +0200
  • ff5e16b0e2 working tunable mi300-temp fxmarty 2024-05-02 13:29:20 +0000
  • de079d607a
    Remove misleading warning (not that important nowadays anyway). (#1848) Nicolas Patry 2024-05-02 15:09:46 +0200
  • ddbf038761 Remove misleading warning (not that important nowadays anyway). Nicolas Patry 2024-05-02 15:09:04 +0200
  • 8ec3b1a7a7 Merge branch 'main' into mi300-compat fxmarty 2024-05-02 10:53:18 +0200
  • 2677bf856a wip fix tunableop fxmarty 2024-05-02 08:15:52 +0000
  • c98a6b9948 feat: improve message content chunks handling drbh 2024-05-02 03:46:40 +0000
  • 0038e6020f
    Adding scripts to prepare load data. (#1841) Nicolas Patry 2024-05-01 21:48:06 +0200
  • 068ff80199 fix: further simplify examples drbh 2024-05-01 19:23:30 +0000
  • f8e31c0243 feat: prefer huggingface_hub in docs and show image api drbh 2024-05-01 19:04:50 +0000
  • 27b3a2c9fc
    Fix: "Fixing" double BOS for mistral too. (#1843) Nicolas Patry 2024-05-01 18:21:17 +0200
  • d1639a5827 "Fixing" double BOS for mistral too. Nicolas Patry 2024-05-01 18:20:44 +0200
  • affef3276e Adding orca script. Nicolas Patry 2024-05-01 11:42:51 +0200
  • ab156adc0f Adding scripts to prepare load data. Nicolas Patry 2024-05-01 09:11:57 +0000
  • b2721091ae Dummy PR (secrets/credentials ?) dummy Nicolas Patry 2024-05-01 09:06:14 +0200
  • 6073ece4fc
    fix: split docs and start conceptual page (#1836) v2.0.2 drbh 2024-05-01 03:03:25 -0400
  • 4cdc692023 feat: add sampling gif drbh 2024-04-30 15:24:16 -0400
  • a2e48ec3a2 feat: fix typo and add more diagrams drbh 2024-04-30 14:54:11 -0400
  • d48846351d fix: remove redundant table drbh 2024-04-30 13:33:50 -0400
  • 43a43fdfbc fix: improve image for dark and light mode drbh 2024-04-30 13:27:25 -0400
  • 8a417da317 feat: tweaks and images drbh 2024-04-30 12:52:30 -0400
  • 07fdfca858 fix: split docs and start conceptual page drbh 2024-04-30 12:33:20 -0400
  • a509360619 trying to update to ROCm 6.1 fxmarty 2024-04-30 16:17:37 +0000
  • dccab72549
    (chore): torch 2.3.0 (#1833) Nicolas Patry 2024-04-30 18:15:35 +0200
  • 6e14a11f4a Upgrade version of server. Nicolas Patry 2024-04-30 17:56:34 +0200
  • bdd84287a0 Missing deps from vllm (which are now imported). Nicolas Patry 2024-04-30 15:30:34 +0000
  • 6ec7876edb What about /opt ? ci-xpu Morgan Funtowicz 2024-04-30 16:42:01 +0200
  • 853400cedb Let's try /usr/bin for sccache for Intel Morgan Funtowicz 2024-04-30 16:38:31 +0200
  • 1a411e0e22 (chore): torch 2.3.0 Nicolas Patry 2024-04-30 14:36:33 +0000
  • 925cbb0e25 let's see if we really need sudo for intel Morgan Funtowicz 2024-04-30 16:29:55 +0200
  • 658e65e2e3 Upgrade all the actions deps Morgan Funtowicz 2024-04-30 16:28:21 +0200
  • 49c57dc87d Let's try with Python 3.8 instead of 3.9 Morgan Funtowicz 2024-04-30 16:26:58 +0200
  • 85bbe2f9c3 Upgrade Python setup for intel Morgan Funtowicz 2024-04-30 16:25:03 +0200
  • afbfdd98f1 Let's dispatch Intel XPU on the right runner group Morgan Funtowicz 2024-04-30 16:22:33 +0200
  • f46f70f02a OK let's duplicate the job and dispatch on different labels Morgan Funtowicz 2024-04-30 16:19:59 +0200
  • 9940776293 Enable TGI on XPU tests Morgan Funtowicz 2024-04-30 16:04:40 +0200
  • b4ef038837
    chore: update torch (#1730) OlivierDehaene 2024-04-30 14:04:28 +0200
  • e81394e165 Move on to 2.3.0 Nicolas Patry 2024-04-30 12:22:54 +0200
  • 772c3774df remvoe unused kernels OlivierDehaene 2024-04-12 14:18:58 +0200
  • ee04a3d3ee fix vllm build OlivierDehaene 2024-04-12 11:27:40 +0200
  • 891adacdb9 chore: update torch OlivierDehaene 2024-04-12 10:18:40 +0200
  • c99ecd77ec
    Handle images in chat api (#1828) drbh 2024-04-30 06:18:32 -0400
  • 7182d5de83
    Update router/src/lib.rs Nicolas Patry 2024-04-30 12:16:24 +0200
  • b2c982750a
    feat: add vlm docs and simple examples (#1812) drbh 2024-04-30 06:14:39 -0400
  • 9192de57cc
    Fixing frequency penalty (#1811) Martin Iglesias Goyanes 2024-04-30 12:13:23 +0200
  • 21ec5393ac chore: rebase and fix formatting martini 2024-04-30 09:46:27 +0200
  • fcbd7fcd2e fix: take into account logits frequency so far in a generation stream when apply freq penalty martini 2024-04-25 23:47:20 +0200