Commit Graph

  • 0006fab5ab
    Apply suggestions from code review Nicolas Patry 2023-12-11 12:06:35 +0100
  • b1897acfd6
    Calculate token budget with padding to max_input_length (#2) Karol Damaszke 2023-12-11 09:24:27 +0100
  • 6436ae86a1
    Fix for continuous batching (#1) Karol Damaszke 2023-12-11 09:24:09 +0100
  • b6519b5279 Update medusa sampling. Nicolas Patry 2023-12-09 17:08:21 +0000
  • e95a5a897b Removing dead code. Nicolas Patry 2023-12-08 17:33:30 +0000
  • ba16994e8a Fixing medusa off by ones. Nicolas Patry 2023-12-08 16:28:04 +0000
  • abc8d48d96 Old llama test. Nicolas Patry 2023-12-06 19:51:27 +0000
  • e5f124b077 Merge tag 'v1.2.0' into v1.2-release regisss 2023-12-06 18:46:16 +0100
  • 3a79fbc63e Updated. Nicolas Patry 2023-12-06 16:41:04 +0000
  • d2b42f6883 Updating medusa test + Speeding ngram immensely by just making a smple search on device instead of on CPU with bad worst cases O(n) Nicolas Patry 2023-12-06 16:31:35 +0000
  • 3a8b1923db Remove ngram debug code Nicolas Patry 2023-12-06 10:05:11 +0000
  • b3c1492be1 Revert integration tests modifications. Nicolas Patry 2023-12-06 09:49:54 +0000
  • 6350c11df3 Discard all params modifications, we're not running ngram speculation now. Nicolas Patry 2023-12-06 09:46:50 +0000
  • f6958ea6d4 Include a few fixes Nicolas Patry 2023-12-06 09:45:42 +0000
  • c09066aeb1 Merge tag 'v1.1.1' into v1.1-release regisss 2023-12-06 09:50:58 +0100
  • 7b34445457 Improve create_n_gram degradation. Nicolas Patry 2023-12-06 06:31:57 +0000
  • a3cc5a94c6 Cargo fmt. Nicolas Patry 2023-12-05 22:15:59 +0000
  • fdef00c27e Fix no speculation. Nicolas Patry 2023-12-05 22:02:22 +0000
  • 9bf31fe388 Fixing infer iterator. Nicolas Patry 2023-12-05 20:48:05 +0000
  • 09839b05f4 Fixing some simple stuff, adding speculate to budget. Nicolas Patry 2023-12-05 16:38:46 +0000
  • 5aa3a01971 Fmt. Nicolas Patry 2023-12-05 15:45:34 +0000
  • cb8a1680fe Fix. Nicolas Patry 2023-12-05 15:29:34 +0000
  • be481a4799 Address comments. Nicolas Patry 2023-12-05 15:21:42 +0000
  • 3238c49121
    Add a stale bot. (#1313) Nicolas Patry 2023-12-05 14:42:55 +0100
  • f2507d04e6 Add a stable bot. Nicolas Patry 2023-12-05 12:57:03 +0100
  • cc744ba426 Add changes from Optimum Habana's TGI folder regisss 2023-12-05 11:12:16 +0100
  • e808222dbf Working around falcon tests. Nicolas Patry 2023-12-05 09:40:29 +0000
  • 4d6efe32de cargo update. Nicolas Patry 2023-12-05 09:13:20 +0000
  • 8efff84609 C'mon falcon. Nicolas Patry 2023-12-04 20:40:57 +0000
  • f5765985ff Revert falcon load modification. Nicolas Patry 2023-12-04 15:49:28 +0000
  • 269792094b Cargo fmt Nicolas Patry 2023-12-04 14:56:38 +0000
  • d99f281050 Remove pdb comments. Nicolas Patry 2023-12-04 14:43:29 +0000
  • 79f9afba90 Needed to regenerate params tests + fix simple tests Nicolas Patry 2023-12-04 14:36:21 +0000
  • 970e57b393 Need to update params since tensor changed. Nicolas Patry 2023-12-04 14:03:51 +0000
  • 1f46bc483d Updating launcher + docs. Nicolas Patry 2023-12-04 13:51:28 +0000
  • bdd9596b6c Propagate speculate Nicolas Patry 2023-12-04 13:50:59 +0000
  • 7ed07bcc05 Speculative decoding + mistral Nicolas Patry 2023-12-04 13:42:28 +0000
  • e7e07342bd Working state except all params ?? Nicolas Patry 2023-12-01 18:49:01 +0000
  • 657ccd8276 Medusa + ngram Nicolas Patry 2023-12-01 17:57:20 +0000
  • b4d97d52cd Non breaking router. Nicolas Patry 2023-11-30 08:27:25 +0000
  • cda627eac0 Modifying the protobuf. Nicolas Patry 2023-11-29 16:20:11 +0000
  • aa442dc650 Speedup 2x. Nicolas Patry 2023-11-29 14:36:17 +0000
  • 243e9c37d3 Speculative medusa (illegal address Paged). Nicolas Patry 2023-11-28 22:23:03 +0000
  • 3fae84dfc0 Tmp. Nicolas Patry 2023-09-18 13:37:59 +0000
  • d7d07d44a4 Tmp work for medusa. Nicolas Patry 2023-09-11 22:12:19 +0000
  • 25b5f81941
    Fix AMD documentation (#1307) fxmarty 2023-12-04 14:09:51 +0100
  • 2c446f7bde add readme HuaYZhao 2023-12-04 21:03:11 +0800
  • a1d15f15e1 typo Félix Marty 2023-12-04 13:42:29 +0100
  • f4ff5b9a8c update readme image Félix Marty 2023-12-04 13:41:26 +0100
  • b847cf33b2 add reference to Inferentia2 in the doc Félix Marty 2023-12-04 13:39:01 +0100
  • e6b3a1e0a8 update architecture Félix Marty 2023-12-04 13:32:31 +0100
  • 529d7d3676 use ROCm instead of RoCm Félix Marty 2023-12-04 10:31:01 +0100
  • bda189062d fix amd doc Félix Marty 2023-12-04 10:29:34 +0100
  • 75e8521eb9 fix warning for pydantic 2 Tim Shih 2023-12-01 10:37:14 +0800
  • 5b340a5ffd Dump work. medusa Nicolas Patry 2023-11-30 22:05:51 +0000
  • bdbccb774c Some work to get batching working. Nicolas Patry 2023-11-30 18:59:16 +0000
  • ccd5725a0c v1.2.0 v1.2.0 OlivierDehaene 2023-11-30 15:18:15 +0100
  • b0cb4fa9d0 Non breaking router. Nicolas Patry 2023-11-30 08:27:25 +0000
  • a478b276eb Modifying the protobuf. Nicolas Patry 2023-11-29 16:20:11 +0000
  • 866af9b9fd Speedup 2x. Nicolas Patry 2023-11-29 14:36:17 +0000
  • 116769a5f5 add allgather HuaYZhao 2023-11-29 14:21:42 +0800
  • 8897b89606 Speculative medusa (illegal address Paged). Nicolas Patry 2023-11-28 22:23:03 +0000
  • 624800c4de
    Make GPTQ test less flaky (#1295) Nicolas Patry 2023-11-28 21:22:35 +0100
  • cf7c17c66b Make GPTQ test less flaky. Nicolas Patry 2023-11-28 18:04:52 +0100
  • ba552e1a82
    Let each model resolve their own default dtype. (#1287) Nicolas Patry 2023-11-28 17:54:26 +0100
  • a2e9ccbb10 Tmp. Nicolas Patry 2023-09-18 13:37:59 +0000
  • 94a0bf1bbc Tmp work for medusa. Nicolas Patry 2023-09-11 22:12:19 +0000
  • 3c71c656c7
    make install-flash-attn-v2-cuda should work like make install-flash-attn-v2 used to work. (#1294) Nicolas Patry 2023-11-28 16:28:40 +0100
  • 5723454b9e Dummy files. Nicolas Patry 2023-11-28 15:15:10 +0000
  • 26a271fad5 Adding the flag to docker laucnher. Nicolas Patry 2023-11-28 15:14:46 +0000
  • e3c31c9d92 Allow dtype for bitsandbytes (it works, checked for idefics 9b/llama/80b)t Nicolas Patry 2023-11-28 14:15:56 +0000
  • 1c09961f14 make install-flash-attn-v2-cuda should work like make install-flash-attn-v2 used to work. Nicolas Patry 2023-11-28 15:03:08 +0100
  • 2144a6894d
    Merge branch 'main' into main xihajun 2023-11-28 12:24:36 +0000
  • d81f0bb6b9
    Rename requirements.txt to requirements_cuda.txt xihajun 2023-11-28 12:22:11 +0000
  • b2b5df0e94
    Add RoCm support (#1243) fxmarty 2023-11-27 14:08:12 +0100
  • 2713b21132 Let each model resolve their own default dtype. Nicolas Patry 2023-11-27 10:30:35 +0000
  • ed2a3f617e
    Exllama v2 (#1211) Nicolas Patry 2023-11-25 22:38:38 +0100
  • 86009e28ac Keep exllamav1 for sharded flows. Nicolas Patry 2023-11-25 19:48:04 +0000
  • d9dffb55c0 Rebased. Nicolas Patry 2023-11-25 11:50:31 +0000
  • a7ed31cf6c Deactivating v2 for sharded. Nicolas Patry 2023-11-25 11:30:20 +0000
  • b041bf15ae Fix imports. Nicolas Patry 2023-11-23 13:29:03 +0000
  • ff51589332 Exllamav2 functional. Nicolas Patry 2023-11-23 11:34:22 +0000
  • a61f432599 Fixing exllamav2. Ubuntu 2023-11-23 11:24:41 +0000
  • fb64ce1040 Adding scratch space. Nicolas Patry 2023-10-30 16:33:58 +0000
  • 024bdb0142 Update exllamav2 (illegal address issued) Nicolas Patry 2023-10-30 15:58:23 +0000
  • f96d997494 use exllamav2QuantLinear instead of exllama1 Florian Zimmermeister 2023-10-25 12:26:20 +0200
  • a02f6839e9 build in docker Florian Zimmermeister 2023-10-17 22:33:06 +0200
  • 0ae4523c53 draft exllamav2, first commit Florian Zimmermeister 2023-10-17 22:14:36 +0200
  • 9e76fbcc00
    Updating vllm to use cuda 12.1 and pytorch 2.1 Thomas Wood 2023-11-25 06:35:28 -0500
  • b7dba16160 cargo fmt Félix Marty 2023-11-24 18:26:40 +0100
  • 8aef399209 latest=false Félix Marty 2023-11-24 18:08:40 +0100
  • 60679fa297
    Refactor model instantiation for Mistral type xihajun 2023-11-23 17:45:38 +0000
  • 2b9b04008b Revert back to version 1.1.1 xihajun 2023-11-23 17:28:11 +0000
  • d856888acb
    Update tokenizers==0.15.0 and transformers==4.35.2 xihajun 2023-11-23 17:23:59 +0000
  • 99a2075775 Merge branch 'main' into tgi-rocm Félix Marty 2023-11-23 17:48:11 +0100
  • f271be417d format Your Name 2023-11-23 17:47:43 +0100
  • 31a7e9ef1b
    Test for a different version name xihajun 2023-11-23 16:38:29 +0000
  • f1004498d0
    Test on mistral with CausalLM xihajun 2023-11-23 16:34:06 +0000
  • d4e2ca08bb clean doc Your Name 2023-11-23 16:51:17 +0100
  • 3c02262f29 Reduce race condition on file system for test Nicolas Patry 2023-11-23 15:42:48 +0000