Commit Graph

  • 5558dca0ec feat(server): Support BLOOMChat-176B Nick Hill 2023-05-21 08:11:49 -0700
  • 85403d1138 fix: set MODEL_ID in sagemaker-entrypoint script Xin Yang 2023-05-19 11:13:06 -0700
  • a6dd19b042
    Specialize code Joel Lamy-Poirier 2023-05-18 17:04:03 -0400
  • 7c11ceba6c
    Extract kv cache stuff Joel Lamy-Poirier 2023-05-18 17:01:20 -0400
  • 3c725314e1
    Add inference runner Joel Lamy-Poirier 2023-05-18 17:00:44 -0400
  • dd0028f336
    cleanup Joel Lamy-Poirier 2023-05-18 14:43:31 -0400
  • 6818b8c285
    Copy gpt bigcode Joel Lamy-Poirier 2023-05-18 14:39:34 -0400
  • 912cf911a1
    Fix typing error due to Optional Yang, Bo 2023-05-18 07:52:50 -0700
  • 27d30f685a
    Make B a generic type of Model Yang, Bo 2023-05-18 07:29:20 -0700
  • defc7ee287
    Merge remote-tracking branch 'origin/main' into vetorized_lm Joel Lamy-Poirier 2023-05-17 21:22:07 -0400
  • ba5b79bb5e
    Update tests, fix bugs, format Joel Lamy-Poirier 2023-05-17 21:12:53 -0400
  • 7f81c25d07
    watermark and copy test file Joel Lamy-Poirier 2023-05-17 15:56:24 -0400
  • 9b1868b2c2 really fix pruning of cancelled requests from queue Nick Hill 2023-05-16 14:26:46 -0700
  • 5a58226130
    fix(server): fix decode token (#334) OlivierDehaene 2023-05-16 23:23:27 +0200
  • ab4037c640 fix naming OlivierDehaene 2023-05-16 21:24:53 +0200
  • f08a1a50b7 add parallelization OlivierDehaene 2023-05-16 21:14:29 +0200
  • 8ddbdea45b Better prefix for edge cases. Nicolas Patry 2023-05-16 12:31:00 +0200
  • 34e0a5b4a4 Fixing initialization of token, token_offset. Nicolas Patry 2023-05-16 12:14:36 +0200
  • 1aa31bb5cc Simplifying streaming decode. Nicolas Patry 2023-05-16 11:32:25 +0200
  • d2a99b4294 fix test OlivierDehaene 2023-05-16 11:09:30 +0200
  • 92178b875e fix(server): decode buffer should be pair OlivierDehaene 2023-05-16 10:24:36 +0200
  • dbdc587ddd
    feat(integration-tests): improve comparison and health checks (#336) OlivierDehaene 2023-05-16 20:22:11 +0200
  • 35a58992e8 trigger workflow OlivierDehaene 2023-05-16 19:51:19 +0200
  • 03836deacc fix tests OlivierDehaene 2023-05-16 19:49:55 +0200
  • 1c1a6e44bc feat(integration-tests): improve comparison and health checks OlivierDehaene 2023-05-16 19:45:29 +0200
  • b00907d36f
    Add missing dependency to the target install-flash-attention Yang, Bo 2023-05-16 09:57:48 -0700
  • 51355ea722 fix to pruning of cancelled requests from queue Nick Hill 2023-05-16 06:27:34 -0700
  • e71471bec9
    feat: add snapshot testing (#282) OlivierDehaene 2023-05-15 23:36:30 +0200
  • 8d0f8c2c30 remove logprobs OlivierDehaene 2023-05-15 23:06:57 +0200
  • 9fcf03d13c add short slug OlivierDehaene 2023-05-15 21:54:37 +0200
  • d245ddc04f fix OlivierDehaene 2023-05-15 20:51:59 +0200
  • 2f561809bc set docker volume OlivierDehaene 2023-05-15 20:35:40 +0200
  • 99657f20c9 ???? OlivierDehaene 2023-05-15 20:29:09 +0200
  • d16420f52b fix OlivierDehaene 2023-05-15 20:23:11 +0200
  • 6568b376ad test only integration tests OlivierDehaene 2023-05-15 20:22:26 +0200
  • 110d29cfe7 typo OlivierDehaene 2023-05-15 19:21:28 +0200
  • 391b80c0f4 fix flash models OlivierDehaene 2023-05-15 18:12:50 +0200
  • a0abfa278e fix main OlivierDehaene 2023-05-15 18:06:49 +0200
  • e33183b118 fix workflow OlivierDehaene 2023-05-15 17:45:36 +0200
  • 42ce781f90 fix workflow OlivierDehaene 2023-05-15 17:42:22 +0200
  • 6e17994fef fix workflow OlivierDehaene 2023-05-15 17:40:37 +0200
  • d325cf9ceb fix workflow OlivierDehaene 2023-05-15 17:34:27 +0200
  • d69e4d2d1e add tests OlivierDehaene 2023-05-15 17:13:33 +0200
  • 9a9244937b add flash bigcode models OlivierDehaene 2023-05-04 11:07:39 +0200
  • 421372f271 feat(tests): add snapshot testing OlivierDehaene 2023-05-03 19:09:18 +0200
  • d9e6c514b5 working local OlivierDehaene 2023-05-03 11:44:58 +0200
  • 338d30b43b wip OlivierDehaene 2023-04-26 00:52:44 +0200
  • f58f0a0364
    Single place for TP layers + Dropout Layer Norm + FastLinear (#329) Nicolas Patry 2023-05-15 17:30:47 +0200
  • 42d8efcb04 Fixing layer imports (for isinstance compat). Nicolas Patry 2023-05-15 16:46:32 +0200
  • 7ccb8eefdc TMP. remove_post_load_weights Nicolas Patry 2023-05-15 16:43:32 +0200
  • 66b277321d
    feat(ci): custom gpu runners (#328) OlivierDehaene 2023-05-15 15:53:08 +0200
  • 8ad6c60271 revert OlivierDehaene 2023-05-15 15:27:17 +0200
  • edc9ce9beb Cleanup. Nicolas Patry 2023-05-15 15:21:49 +0200
  • d7a97aa0b6
    Removing dead variables. (#327) Nicolas Patry 2023-05-15 15:14:17 +0200
  • 8d42e1d191 More. Nicolas Patry 2023-05-15 15:13:59 +0200
  • f0f660700a use another version of tailscale github action OlivierDehaene 2023-05-15 15:12:36 +0200
  • 7fc999b7bd
    Update server/text_generation_server/models/flash_neox.py Nicolas Patry 2023-05-15 15:03:58 +0200
  • 848c4aa407 Single place for TP layers + Dropout Layer Norm + FastLinear Nicolas Patry 2023-05-15 15:02:19 +0200
  • 4af2e60333 remove custom role OlivierDehaene 2023-05-15 14:57:46 +0200
  • b4907ec422 feat(ci): custom gpu runners OlivierDehaene 2023-05-15 14:51:04 +0200
  • 89ff4e901a Removing dead variables. Nicolas Patry 2023-05-15 12:33:21 +0200
  • 91e674bb85
    Lifting check_unitialized. (#325) Nicolas Patry 2023-05-15 11:32:25 +0200
  • f64c9ba305 Fix for new localization. Nicolas Patry 2023-05-15 10:59:49 +0200
  • 62b4082514 Lifting the call to. Nicolas Patry 2023-05-15 10:38:08 +0200
  • cc3cdeb156 Lifting check_unitialized. Nicolas Patry 2023-05-15 10:38:47 +0200
  • 73d84c6ee5
    Hotfixes for santacoder/bigcode. (#294) Nicolas Patry 2023-05-15 10:35:20 +0200
  • 05f98e8a33
    Update server/text_generation_server/models/__init__.py OlivierDehaene 2023-05-15 10:20:20 +0200
  • cd8477bcf8 Loading config *after* checking for model name. Nicolas Patry 2023-05-15 10:01:59 +0200
  • 22c4fd07ab fix(docker): use ubuntu20.04 OlivierDehaene 2023-05-12 18:38:59 +0200
  • 119f7e0687 fix(docker): remove quantize default OlivierDehaene 2023-05-12 17:56:32 +0200
  • 8a8f43410d
    chore(docker): use nvidia base image (#318) OlivierDehaene 2023-05-12 17:32:40 +0200
  • b93a574ced chore(docker): use nvidia base image OlivierDehaene 2023-05-12 17:07:01 +0200
  • 76a48cd365
    feat(server): GPTQ quantization (step1) (#277) Nicolas Patry 2023-05-12 14:46:41 +0200
  • a86e4bf713 Working version. Ubuntu 2023-05-09 08:35:59 +0000
  • 57a6cbff82 Tmp work for sharding to work properly. Ubuntu 2023-05-08 09:00:54 +0000
  • c5846ee73a Dump. Ubuntu 2023-05-05 16:31:55 +0000
  • c126ca01d9 Non local file. Ubuntu 2023-05-04 13:22:42 +0000
  • c3d12ae2d4 Some protection against sharding (illegal access becuase of g_idx) Ubuntu 2023-05-02 17:49:42 +0000
  • 2c9e1171bc [WIP] Adding GPTQ support for llama Ubuntu 2023-05-02 17:07:33 +0000
  • 4f6d038c0b fix(server): fix multinomial implem in Sampling OlivierDehaene 2023-05-11 13:30:38 +0200
  • d4a94f7a94
    Include health into OpenAPI doc Yang, Bo 2023-05-10 21:54:40 -0700
  • a6c18c39bb
    feat(server): use cuda graph in logits warping (#302) OlivierDehaene 2023-05-10 19:08:54 +0200
  • b1f80702ef inline OlivierDehaene 2023-05-10 18:40:26 +0200
  • a944dd0fd5 cleanup OlivierDehaene 2023-05-10 18:20:28 +0200
  • 3248fdfbd4 fix multinomial gpu cpu sync OlivierDehaene 2023-05-10 17:54:04 +0200
  • 35ab6cfcf1 fix(docker): remove CUDA_VERSION OlivierDehaene 2023-05-10 16:16:06 +0200
  • 1df2aa03c5 add cpu support OlivierDehaene 2023-05-09 16:30:19 +0200
  • e2727387aa add cuda graphs to token warping OlivierDehaene 2023-05-09 16:30:19 +0200
  • 745f596c88
    feat(server): use float16 (#304) OlivierDehaene 2023-05-10 15:51:10 +0200
  • 805f06e433
    Merge branch 'main' into feat/float16 OlivierDehaene 2023-05-10 15:50:41 +0200
  • 68e9d6ab33
    feat(server): shard token decode (#303) OlivierDehaene 2023-05-10 15:48:21 +0200
  • 1585404464
    fix(docker): remove nvidia require cuda env (#310) OlivierDehaene 2023-05-10 15:29:21 +0200
  • d9d3c9a67c fix(docker): remove nvidia require cuda env OlivierDehaene 2023-05-10 15:28:56 +0200
  • bef3458ee8
    Update server/text_generation_server/models/__init__.py Nicolas Patry 2023-05-10 12:30:28 +0200
  • b10c6e83f2
    Clarify that generated_text is optional Yang, Bo 2023-05-09 21:30:47 -0700
  • 49cffad1bc
    fix(docker): fix nvidia env vars (#305) OlivierDehaene 2023-05-09 19:02:52 +0200
  • fc35575662 fix(docker): fix nvidia env vars OlivierDehaene 2023-05-09 19:02:23 +0200
  • f0609e73d8 add docs OlivierDehaene 2023-05-09 18:40:17 +0200
  • 090352d965 feat(server): use float16 OlivierDehaene 2023-05-09 18:36:20 +0200
  • 89565b4eaf feat(server): shard token decode OlivierDehaene 2023-05-09 18:35:04 +0200