Commit Graph

  • ad66f6ef9a
    feat(server): optim flash causal lm decode_token (#285) OlivierDehaene 2023-05-09 18:26:19 +0200
  • 1a3aa08fa0 revert change on all gather OlivierDehaene 2023-05-09 18:09:42 +0200
  • bc5c07231e
    fix(docker): fix docker build (#299) OlivierDehaene 2023-05-09 14:39:59 +0200
  • dd4252451f fix(docker): fix docker build OlivierDehaene 2023-05-09 13:20:52 +0200
  • e250282213
    feat(docker): add benchmarking tool to docker image (#298) OlivierDehaene 2023-05-09 13:19:31 +0200
  • 4fb69b9d1c add new openapi schema OlivierDehaene 2023-05-09 13:17:34 +0200
  • 3225fed42e feat(docker): add benchmarking tool to dockerfile OlivierDehaene 2023-05-09 12:39:28 +0200
  • 926fd9a010
    feat(router): Adding response schema for compat_generate (#292) Sai Vinay G 2023-05-09 16:08:09 +0530
  • e9b01b3433
    fix(dockerfile): fix nvidia env vars (#297) OlivierDehaene 2023-05-09 12:36:02 +0200
  • 5ef4c94289 fix(dockerfile): fix nvidia env vars OlivierDehaene 2023-05-09 11:35:06 +0200
  • 7e11c5d92b Hotfixes for santacoder/bigcode. Ubuntu 2023-05-08 09:45:27 +0000
  • fb8c5365da Adding response schema for compat_generate Sai Vinay G 2023-05-07 11:57:06 +0000
  • 87b5f03958
    format Joel Lamy-Poirier 2023-05-05 18:48:57 -0400
  • 0e648a71f9
    Refactor next token chooser Joel Lamy-Poirier 2023-05-05 18:45:53 -0400
  • e29bb90e88
    fixes Joel Lamy-Poirier 2023-05-05 16:26:50 -0400
  • a5bf08f6e2
    Return details optionally Joel Lamy-Poirier 2023-05-05 15:02:54 -0400
  • b4aa87db58
    fea(server): decrease convert RAM requirements (#286) Nicolas Patry 2023-05-05 17:57:02 +0200
  • 3314a46d36
    chore: add flash-attention to docker ignore (#287) Nicolas Patry 2023-05-05 17:52:09 +0200
  • d35d747619 Simple trick to preven flash-attention directory from being Nicolas Patry 2023-05-05 17:47:15 +0200
  • bf5990ee9e more explicit OlivierDehaene 2023-05-05 17:37:17 +0200
  • a7c10f710f
    Fixes and generation details arg Joel Lamy-Poirier 2023-05-05 11:32:38 -0400
  • c969c8c091 faster OlivierDehaene 2023-05-05 17:26:52 +0200
  • 9e48730e51
    Update server/text_generation_server/utils/convert.py Nicolas Patry 2023-05-05 17:14:39 +0200
  • 872757bf7e Do not load the entire array while converting to save on RAM. Nicolas Patry 2023-05-05 16:17:28 +0200
  • 690fc31757
    fix(server): fix convert (#284) Nicolas Patry 2023-05-05 15:28:08 +0200
  • 1cbc5c633e final OlivierDehaene 2023-05-05 15:27:08 +0200
  • d5f66c0d97
    Update server/text_generation_server/utils/convert.py Nicolas Patry 2023-05-05 13:27:00 +0200
  • 59e934c0e4
    Nothing happened.. Nicolas Patry 2023-05-05 12:28:19 +0200
  • b3b1b81982
    fix Joel Lamy-Poirier 2023-05-04 20:38:08 -0400
  • 46363e1cd7
    concatenate Joel Lamy-Poirier 2023-05-04 17:34:28 -0400
  • 7a70928b06
    filter Joel Lamy-Poirier 2023-05-04 14:22:53 -0400
  • f6df8db680 wip OlivierDehaene 2023-05-04 19:37:12 +0200
  • 476d8fc379
    Use next token chooser Joel Lamy-Poirier 2023-05-04 11:52:11 -0400
  • e68509add7
    feat(launcher): Improve error message when download process fails. (#276) Nicolas Patry 2023-05-04 15:29:29 +0200
  • d1a3cac223 add logs OlivierDehaene 2023-05-04 15:28:49 +0200
  • 5d5a2de96c wip OlivierDehaene 2023-05-04 15:23:20 +0200
  • f08343d44d
    fix(server): Removes the parallelism in file convertion (during download) (#275) Nicolas Patry 2023-05-04 15:22:54 +0200
  • b4fe248b17
    fix(launcher): handle hub branches (#278) Nicolas Patry 2023-05-04 15:14:28 +0200
  • 4882de4d7a Easier quantization. Nicolas Patry 2023-05-04 14:25:24 +0200
  • e5e66baa24 Handle --revision refs/pr/2 better. Nicolas Patry 2023-05-04 12:36:11 +0200
  • e2d167256a Updating all models. Nicolas Patry 2023-05-04 12:31:51 +0200
  • 1185f66205 Adding Quantization GPTQ as an option (step 1) Nicolas Patry 2023-05-04 12:16:40 +0200
  • 37f305a0eb Improve error message when download process fails. Nicolas Patry 2023-05-04 11:23:42 +0200
  • 43f3055331 Removes the parallelism in file convertion (during download) Nicolas Patry 2023-05-04 10:58:20 +0200
  • b67908e0cf
    fix(launcher): pass weights cache override to the download process (#274) OlivierDehaene 2023-05-03 23:39:35 +0200
  • 52be716cab fix(launcher): pass weights cache override to the download process OlivierDehaene 2023-05-03 23:38:44 +0200
  • d5685656a4
    sampling Joel Lamy-Poirier 2023-05-03 15:17:06 -0400
  • 4554a69b22
    Top p and typical p Joel Lamy-Poirier 2023-05-03 14:55:08 -0400
  • 812de7ee50 feat(server): optimize flash causal lm OlivierDehaene 2023-05-03 20:52:27 +0200
  • cc929530c2
    cleanup Joel Lamy-Poirier 2023-05-03 14:28:49 -0400
  • d5ff681b00
    Top k attempt Joel Lamy-Poirier 2023-05-03 14:25:35 -0400
  • 5677540881
    stuff Joel Lamy-Poirier 2023-05-03 11:16:35 -0400
  • 85aa7e2e7b
    feat(server): support hf endpoint weight layout (#266) OlivierDehaene 2023-05-03 11:36:24 +0200
  • 4096000e34
    fix(server): fix typo in tokenizers decode (#269) OlivierDehaene 2023-05-03 10:10:34 +0200
  • 2b67bab02a better comments OlivierDehaene 2023-05-03 09:55:17 +0200
  • 6a7a6b0661 fix(server): fix typo in tokenizers decode OlivierDehaene 2023-05-03 09:48:05 +0200
  • 89fc8b4812 feat(server): support hf endpoint weight layout OlivierDehaene 2023-05-02 17:24:11 +0200
  • 411b0d4e1f
    chore(github): add templates (#264) Nicolas Patry 2023-05-02 15:43:19 +0200
  • 7242e52025 remove dokcer image inspect OlivierDehaene 2023-05-02 15:23:31 +0200
  • ce394f5d69 add docker label in build OlivierDehaene 2023-05-02 15:20:27 +0200
  • c207b88b3a small change to handling of cancelled requests Nick Hill 2023-05-02 12:53:13 +0100
  • 7552483dde fixes and some extra comments Nick Hill 2023-05-01 21:22:18 +0100
  • 9763ff0989
    Using vergen Nicolas Patry 2023-05-01 18:39:35 +0200
  • 982a04246b
    Burn cargo version within binary. Nicolas Patry 2023-05-01 16:11:44 +0200
  • 15756ba1d4
    Some precisions. Nicolas Patry 2023-05-01 15:44:38 +0200
  • 3bf04aea3f
    Adding the env cli (in launcher). Nicolas Patry 2023-05-01 15:40:03 +0200
  • 61fd8c3c01
    Adding CLI to run to get your env. Nicolas Patry 2023-05-01 15:29:29 +0200
  • ed7e797259
    Adding specific description for tgi. Nicolas Patry 2023-05-01 14:40:43 +0200
  • d441b67586
    Adding some issue + PR templates. Nicolas Patry 2023-05-01 14:30:16 +0200
  • e86cca9723
    Adding docs on how dynamic batching works. (#258) Nicolas Patry 2023-05-01 14:16:50 +0200
  • c6bb42286e Rename dynamic to continuous. Nicolas Patry 2023-05-01 10:50:09 +0200
  • 0e9d249b79
    feat(benchmark): add support for private tokenizers (#262) OlivierDehaene 2023-04-29 12:17:30 +0200
  • db0362ba32 feat(benchmark): add support for private tokenizers OlivierDehaene 2023-04-29 12:17:06 +0200
  • 0181675b0c
    Merge branch 'huggingface:main' into docker Akash Sonowal 2023-04-29 15:38:11 +0530
  • b0b97fd9a7
    doc(launcher): add more docs to the launcher itself and link in the README (#257) Nicolas Patry 2023-04-29 11:53:42 +0200
  • 1f2bfdc976
    Update Dockerfile Akash Sonowal 2023-04-29 15:17:02 +0530
  • 0f7d76f927
    Specify response schema for compat_generate Feynman Liang 2023-04-28 12:39:52 -0700
  • cb7993698b
    Adding docs on how dynamic batching works. Nicolas Patry 2023-04-28 21:12:08 +0200
  • f7411f0a86
    Adding more docs to the launcher itself and link in the README. Nicolas Patry 2023-04-28 20:22:14 +0200
  • cbbc046a79
    stuff Joel Lamy-Poirier 2023-04-28 11:18:25 -0400
  • 08aee68f79 abstract for padded batch case Nick Hill 2023-04-27 07:53:09 +0100
  • 593a563414
    feat(docker): add nvidia env vars (#255) OlivierDehaene 2023-04-27 19:18:33 +0200
  • f092ba9b22
    feat(server): add watermarking tests (#248) Ehsan M. Kermani 2023-04-27 10:16:35 -0700
  • 349c35e451 feat(docker): add nvidia env vars OlivierDehaene 2023-04-27 18:53:07 +0200
  • e603406327 Remove -1 from batch id and add a note ehsanmok 2023-04-27 09:49:15 -0700
  • 22119ca19f Remove extra pytest config in server ehsanmok 2023-04-27 09:22:22 -0700
  • 0d3ecf9797 Deduplicate the imported models ehsanmok 2023-04-27 09:00:33 -0700
  • 6f608f0d84 Use different id for batch vs liveness ehsanmok 2023-04-27 09:00:14 -0700
  • 080542dbca Bring back the old table in README ehsanmok 2023-04-27 08:59:31 -0700
  • a0e5fc4189 Update README ehsanmok 2023-04-26 18:10:13 -0700
  • 37194a5b9a Use make for uniformity ehsanmok 2023-04-26 18:10:06 -0700
  • d72ee8015a Include python client tests ehsanmok 2023-04-26 18:09:47 -0700
  • 2f20628a5c Add pytest asynio mode config ehsanmok 2023-04-26 18:09:05 -0700
  • 619f2f3a36 Unit test watermark ehsanmok 2023-04-26 18:08:15 -0700
  • 6952be08d7 Add the missing async best_of test case ehsanmok 2023-04-26 18:07:49 -0700
  • 9fbf6032e5 Some small improvements ehsanmok 2023-04-26 18:07:15 -0700
  • b9ae7e5da1
    chore(server): update transformers (#250) OlivierDehaene 2023-04-27 09:57:41 +0200
  • 34bca0b8d3
    fix(server): Small tidy of code from recent changes (#251) Nick Hill 2023-04-27 00:57:28 -0700
  • b4cf832c40
    fix(server): fix reshaping of bloom past_key_values in concatenate() (#252) Nick Hill 2023-04-27 00:51:27 -0700
  • 06de47b8ca fix(server): fix reshaping of bloom past_key_values in concatenate() Nick Hill 2023-04-27 08:21:37 +0100