Commit Graph

  • a5632a6a24 add semver tags OlivierDehaene 2023-02-03 12:30:01 +0100
  • 87dfc4e2c1 increase semver OlivierDehaene 2023-02-03 12:25:20 +0100
  • a7d15c38e8 refactor doc OlivierDehaene 2023-02-03 12:03:50 +0100
  • 5de40eb078 formatting OlivierDehaene 2023-02-02 18:59:21 +0100
  • 4d00990ccd host swagger w/ github pages OlivierDehaene 2023-02-02 18:58:11 +0100
  • 109c5af615 finalize openAPI schemas OlivierDehaene 2023-02-02 18:37:07 +0100
  • 2878c43cc5 feat(router): add openAPI schemas OlivierDehaene 2023-02-02 17:31:52 +0100
  • b1482d9048
    breaking(router): modify /generate API to only return generated text (#50) OlivierDehaene 2023-02-02 15:02:04 +0100
  • 8659560f7c skip santacoder tests OlivierDehaene 2023-02-02 15:01:21 +0100
  • f36e736723 breaking(router): modify /generate API to only return generated text OlivierDehaene 2023-02-01 18:38:30 +0100
  • 7b870e1e18
    feat(router): use background task to manage request queue (#52) OlivierDehaene 2023-02-02 14:59:27 +0100
  • dd9f417b8a formatting OlivierDehaene 2023-02-02 14:18:25 +0100
  • e92eb15d45 rename OlivierDehaene 2023-02-02 14:17:20 +0100
  • d2d5394991 improved naming OlivierDehaene 2023-02-02 14:12:05 +0100
  • 0c93da571b improved comments OlivierDehaene 2023-02-02 13:01:59 +0100
  • 3f963d8a00 formatting OlivierDehaene 2023-02-02 12:55:46 +0100
  • 9f45182cfd fix tests OlivierDehaene 2023-02-02 12:55:32 +0100
  • c863f05cfd feat(router): rework db to use a background task OlivierDehaene 2023-02-02 12:54:56 +0100
  • df227ac20d
    fix(server): allow greedy repetition penalty (#51) OlivierDehaene 2023-02-02 10:34:35 +0100
  • f81851c202 fix(server): allow greedy repetition penalty OlivierDehaene 2023-02-02 10:34:08 +0100
  • 775115e3a5
    feat(server): allow the server to use a local weight cache (#49) OlivierDehaene 2023-02-01 16:22:10 +0100
  • 4293e48083 feat(server): allow the serve to use a local weight cache OlivierDehaene 2023-02-01 16:21:25 +0100
  • 313194f6d7
    feat(server): support repetition penalty (#47) OlivierDehaene 2023-02-01 15:58:42 +0100
  • 651403c325 formatting OlivierDehaene 2023-02-01 15:30:37 +0100
  • c25fd1e2e8 fix all_input_ids shape OlivierDehaene 2023-02-01 15:30:09 +0100
  • 2ad895a6cc
    feat(server): allow gpt-neox models with odd vocab sizes to be sharded (#48) OlivierDehaene 2023-02-01 14:43:59 +0100
  • 3149317fa1 formatting OlivierDehaene 2023-02-01 11:48:18 +0100
  • 1d0fa38cb8 feat(server): allow gpt-neox models with odd vocab sizes to be sharded OlivierDehaene 2023-02-01 11:47:32 +0100
  • 404ed7a1f6
    feat(ci): Docker build and push (#46) OlivierDehaene 2023-01-31 20:14:05 +0100
  • 04f3b1c93e add caching OlivierDehaene 2023-01-31 20:06:00 +0100
  • 34fc1e5cc6 feat(server): support repetition penalty OlivierDehaene 2023-01-31 20:03:18 +0100
  • a2aeec9331 feat(ci): Docker build and push OlivierDehaene 2023-01-31 19:13:39 +0100
  • f830706b21
    feat(server): Support GPT-Neox (#39) OlivierDehaene 2023-01-31 18:53:56 +0100
  • b4455b241b update readme OlivierDehaene 2023-01-31 18:38:31 +0100
  • 7df81c34db patch quantization OlivierDehaene 2023-01-31 18:34:47 +0100
  • ffccb7f9ce feat(server): Support GPT-Neox OlivierDehaene 2023-01-30 20:51:48 +0100
  • c6e8b9442b
    fix(server): fix quantization for sharded models (#45) OlivierDehaene 2023-01-31 17:40:38 +0100
  • 4858f122db formatting OlivierDehaene 2023-01-31 17:38:12 +0100
  • ca11e9e8c3 fix(server): fix quantization for sharded models OlivierDehaene 2023-01-31 17:37:50 +0100
  • 017a2a8c2f
    feat: Add token streaming using ServerSideEvents support (#41) OlivierDehaene 2023-01-31 17:04:00 +0100
  • 41767b651f use u32 OlivierDehaene 2023-01-31 16:51:32 +0100
  • d5ab76cdfb use Rust type system to validate logic OlivierDehaene 2023-01-31 16:47:06 +0100
  • 614a1a7202 modify integration tests OlivierDehaene 2023-01-30 16:32:44 +0100
  • f8e230f65c formating OlivierDehaene 2023-01-30 16:17:32 +0100
  • 6d024e5708 support seeding OlivierDehaene 2023-01-30 16:16:58 +0100
  • 5ef1336997 docstring OlivierDehaene 2023-01-30 12:36:04 +0100
  • 42cdb734a5 working python tests OlivierDehaene 2023-01-30 12:18:53 +0100
  • 4a538cfa49 working integration tests OlivierDehaene 2023-01-30 11:37:36 +0100
  • 429155a26a Improved version OlivierDehaene 2023-01-30 10:55:54 +0100
  • 122c137b56 rust code cleanup OlivierDehaene 2023-01-28 09:31:37 +0100
  • 48d095733a black OlivierDehaene 2023-01-27 19:52:14 +0100
  • 432566d931 wip OlivierDehaene 2023-01-27 19:46:58 +0100
  • 54fec93193
    fix(server): fix seeding with multiple shards (#44) OlivierDehaene 2023-01-31 16:01:15 +0100
  • 18b0923d01 fix(server): fix seeding with multiple shards OlivierDehaene 2023-01-31 16:00:17 +0100
  • 03bdf18290
    fix(server): fix seeding on gpu (#42) OlivierDehaene 2023-01-31 14:30:33 +0100
  • 28e5cbada5 fix(server): fix seeding on gpu OlivierDehaene 2023-01-31 14:29:58 +0100
  • 4f9ac67cfa
    Revert "feat: Add token streaming using ServerSideEvents support" (#40) OlivierDehaene 2023-01-31 14:21:51 +0100
  • 0e543e167e Revert "feat: Add token streaming using ServerSideEvents support (#36)" OlivierDehaene 2023-01-31 14:21:13 +0100
  • 7fbfbb0dc5
    feat: Add token streaming using ServerSideEvents support (#36) OlivierDehaene 2023-01-31 11:49:43 +0100
  • 1c10776cde modify integration tests OlivierDehaene 2023-01-30 16:32:44 +0100
  • 3fc811f596 formating OlivierDehaene 2023-01-30 16:17:32 +0100
  • 7d633582e4 support seeding OlivierDehaene 2023-01-30 16:16:58 +0100
  • ab2f784f29 docstring OlivierDehaene 2023-01-30 12:36:04 +0100
  • adf80bc23d working python tests OlivierDehaene 2023-01-30 12:18:53 +0100
  • b2a468176d working integration tests OlivierDehaene 2023-01-30 11:37:36 +0100
  • 046801278e Improved version OlivierDehaene 2023-01-30 10:55:54 +0100
  • 0b34905557 rust code cleanup OlivierDehaene 2023-01-28 09:31:37 +0100
  • 8c2ddfe838 black OlivierDehaene 2023-01-27 19:52:14 +0100
  • d917ae8955 wip OlivierDehaene 2023-01-27 19:46:58 +0100
  • cd298bc5e5
    feat: Support sampling seeding (#37) OlivierDehaene 2023-01-30 15:36:16 +0100
  • 9285f67be5 black OlivierDehaene 2023-01-30 15:15:34 +0100
  • 93f6acc396 feat: Support sampling seeding OlivierDehaene 2023-01-30 14:36:36 +0100
  • a8ddf45c11 cleanup Yannic Kilcher 2023-01-26 14:57:49 +0100
  • 033d2174fd cleanup Yannic Kilcher 2023-01-26 14:57:39 +0100
  • d37b2d3fb9 added streaming endpoint Yannic Kilcher 2023-01-26 14:50:57 +0100
  • 1539d3cbbe
    feat(router): Remove second lock from batcher hot path (#27) OlivierDehaene 2023-01-26 16:29:13 +0100
  • b96fe73beb use IntMap OlivierDehaene 2023-01-26 16:06:34 +0100
  • 67ee1907fc feat(router): Remove second lock from batcher hot path OlivierDehaene 2023-01-20 14:06:33 +0100
  • ce960be0a5
    feat(bloom): use torch.nn.Linear and torch.nn.GELU (#33) OlivierDehaene 2023-01-26 15:33:45 +0100
  • 6e43ef51ba feat(bloom): use torch.nn.Linear and torch.nn.GELU OlivierDehaene 2023-01-26 15:33:14 +0100
  • 9cfd41e03b cleanup Yannic Kilcher 2023-01-26 14:57:49 +0100
  • 65efd51233 cleanup Yannic Kilcher 2023-01-26 14:57:39 +0100
  • 7beb968696 Merge branch 'main' of github.com:huggingface/text-generation-inference Yannic Kilcher 2023-01-26 14:51:07 +0100
  • b1ef80583c added streaming endpoint Yannic Kilcher 2023-01-26 14:50:57 +0100
  • 13e7044ab7
    fix(dockerfile): fix docker build (#32) OlivierDehaene 2023-01-24 19:52:39 +0100
  • acf45830e7 fix(dockerfile): fix docker build OlivierDehaene 2023-01-24 19:52:18 +0100
  • 5c01e2544c
    fix(router): fix api-inference deployment (#31) OlivierDehaene 2023-01-23 17:42:14 +0100
  • 087b4c2721 fix(router): fix api-inference deployment OlivierDehaene 2023-01-23 17:41:42 +0100
  • ab2ad91da3
    fix(docker): fix api-inference deployment (#30) OlivierDehaene 2023-01-23 17:33:08 +0100
  • 507a8d5847 fix(docker): fix api-inference deployment OlivierDehaene 2023-01-23 17:32:15 +0100
  • f9d0ec376a
    feat(docker): Make the image compatible with api-inference (#29) OlivierDehaene 2023-01-23 17:11:27 +0100
  • c655f1cdf2 feat(docker): Make the image compatible with api-inference OlivierDehaene 2023-01-23 17:10:37 +0100
  • f31b8a7fed A small simplification and add a few more comments Nick Hill 2023-01-19 11:48:32 -0800
  • d0ccada7c0 Proposal: Use bounded queue instead of database Nick Hill 2023-01-18 12:21:12 -0800
  • 1f570d181f
    fix(server): Fix position ids (#28) OlivierDehaene 2023-01-20 15:35:22 +0100
  • bc18dbd980 skip santacoder tests OlivierDehaene 2023-01-20 15:34:11 +0100
  • a8d7e94d13 fix(server): Fix position ids OlivierDehaene 2023-01-20 15:33:03 +0100
  • 15511edc01
    feat(server): Support SantaCoder (#26) OlivierDehaene 2023-01-20 12:24:39 +0100
  • 8d4baa14d2 feat(server): Support SantaCoder OlivierDehaene 2023-01-20 12:15:37 +0100
  • f7ac394935
    fix(router): Obey max batch size (#23) Nick Hill 2023-01-17 00:11:21 -0800