Commit Graph

  • af10275f46 use join_all instead OlivierDehaene 2023-04-09 15:10:59 +0200
  • 4267378b1f fix validation error OlivierDehaene 2023-04-09 10:06:53 +0200
  • 82464709d3 fix truncation OlivierDehaene 2023-04-09 09:55:05 +0200
  • 146e0e27ce add validation + decode of special tokens OlivierDehaene 2023-04-07 11:12:16 +0200
  • 273f0ae42c update flash attention OlivierDehaene 2023-04-07 08:59:05 +0200
  • d7b92e379c correct commit OlivierDehaene 2023-04-06 20:21:23 +0200
  • e4ad3066bc fmt OlivierDehaene 2023-04-06 20:09:21 +0200
  • 1111125092 better docker layer caching OlivierDehaene 2023-04-06 20:08:46 +0200
  • c3779fa859 remove profiling OlivierDehaene 2023-04-06 17:58:54 +0200
  • 26fc232afb fix tp OlivierDehaene 2023-04-06 17:27:32 +0200
  • 7816a47697 fix llama tokenizer OlivierDehaene 2023-04-06 17:07:58 +0200
  • 3c272aefc0 fix test OlivierDehaene 2023-04-06 15:03:50 +0200
  • 6c96f37bc8 fix tests OlivierDehaene 2023-04-06 14:44:06 +0200
  • c7dd00ead2 upgrade setuptools OlivierDehaene 2023-04-06 14:20:10 +0200
  • 01ab5df180 update transformers OlivierDehaene 2023-04-06 13:43:49 +0200
  • 70637b4170 use all tokens OlivierDehaene 2023-04-05 17:18:16 +0200
  • b5233f9c3c better decode OlivierDehaene 2023-04-05 13:47:25 +0200
  • 783bc64f47 fix concatenate OlivierDehaene 2023-04-04 18:54:10 +0200
  • c11e77411f improve decode OlivierDehaene 2023-04-04 18:31:26 +0200
  • cdc33ce63c allow disabling hf_transfer OlivierDehaene 2023-04-04 17:46:41 +0200
  • eb033e781f trigger build OlivierDehaene 2023-04-04 15:35:54 +0200
  • 8604d37015 trigger build OlivierDehaene 2023-04-04 15:32:52 +0200
  • f9b09d9629 hack OlivierDehaene 2023-04-04 15:13:32 +0200
  • 30148b776b fix instrumentation OlivierDehaene 2023-04-04 14:51:31 +0200
  • 161e93a45f cleanup OlivierDehaene 2023-04-04 14:47:00 +0200
  • 1dd2c24b9c rework validation OlivierDehaene 2023-04-04 14:43:56 +0200
  • 47e93409f3 optional rust validation OlivierDehaene 2023-04-04 12:35:29 +0200
  • 45eacb782d patch qkv_rot OlivierDehaene 2023-03-31 13:17:45 +0200
  • cd5d0a96ba feat(server): add flash attention llama OlivierDehaene 2023-03-28 16:12:05 +0200
  • 71402ed4c7 wip OlivierDehaene 2023-03-28 13:49:51 +0200
  • 3f2542bb6a
    fix(server): fix escape characters in stop sequence (#155) OlivierDehaene 2023-04-05 19:37:41 +0200
  • f4a94bd312 fix(server): fix escape characters in stop sequence OlivierDehaene 2023-04-05 17:26:21 +0200
  • 9122e7bd9c
    docs(readme): provide link Logits Warper README (#154) Guspan Tanadi 2023-04-04 18:27:46 +0700
  • a70b555502
    docs: link to internal Generation Utilities README Guspan Tanadi 2023-04-04 18:11:30 +0700
  • 189465fd60
    style: Logits Warper mention top-p top-k README Guspan Tanadi 2023-04-04 15:07:28 +0700
  • 578dee03bf
    docs: link mention Logits Warper README Guspan Tanadi 2023-04-04 15:02:14 +0700
  • c0aeb32583
    feat(server): flash santacoder (#153) OlivierDehaene 2023-04-03 19:06:42 +0200
  • 625185d629 update supported models OlivierDehaene 2023-04-03 18:55:43 +0200
  • 0523b4891f append to all OlivierDehaene 2023-04-03 18:51:20 +0200
  • 05aee8b503 feat(server): flash santacoder OlivierDehaene 2023-04-03 15:25:49 +0200
  • 5dfc9c7613 wip OlivierDehaene 2023-03-30 15:52:22 +0200
  • f41ab12783 wip OlivierDehaene 2023-03-29 22:25:23 +0200
  • 65ff6a73b3 Some more simplification, fix flash_neox cu_seqlen pruning Nick Hill 2023-03-30 18:10:18 -0700
  • fef1a1c381
    v0.4.3 (#152) v0.4.3 OlivierDehaene 2023-03-30 17:28:14 +0200
  • beea11051e v0.4.3 OlivierDehaene 2023-03-30 17:27:58 +0200
  • 84722f3e33
    v0.4.2 (#151) v0.4.2 OlivierDehaene 2023-03-30 17:10:01 +0200
  • ea6f3057bf v0.4.2 OlivierDehaene 2023-03-30 17:07:03 +0200
  • 08b7e4a282
    fix(server): fix flash neox rotary embeddings (#150) OlivierDehaene 2023-03-30 16:12:23 +0200
  • 8dc651ec63 fix(server): fix flash neox rotary embeddings OlivierDehaene 2023-03-30 15:55:01 +0200
  • 610bb1f978
    feat(benchmark): tui based benchmarking tool (#149) OlivierDehaene 2023-03-30 15:26:27 +0200
  • 3a0e706346 revert aml changes OlivierDehaene 2023-03-30 15:02:08 +0200
  • 163c23f174 add latency per token OlivierDehaene 2023-03-30 14:49:08 +0200
  • b2d1276c16 add image OlivierDehaene 2023-03-30 13:13:23 +0200
  • c15922b132 exclude benchmark from workspace OlivierDehaene 2023-03-30 12:50:33 +0200
  • b6df2036ed v1 OlivierDehaene 2023-03-30 12:36:17 +0200
  • 17a75c8845 add helper OlivierDehaene 2023-03-30 11:56:39 +0200
  • 271f045825 improving design OlivierDehaene 2023-03-30 11:44:00 +0200
  • ae72d4f96f improving design OlivierDehaene 2023-03-30 11:01:03 +0200
  • a1613e2518 improving design OlivierDehaene 2023-03-30 10:35:18 +0200
  • 55106ec476
    fix(ci): fix sagemaker action (#148) OlivierDehaene 2023-03-29 22:27:01 +0200
  • 855e36ab16 fix(ci): fix sagemaker action OlivierDehaene 2023-03-29 22:26:42 +0200
  • d503e8f09d
    feat: aws sagemaker compatible image (#147) OlivierDehaene 2023-03-29 21:38:30 +0200
  • a49599f432 remove cache to in sagemaker build OlivierDehaene 2023-03-29 14:56:09 +0200
  • 00d5ade28f add new target in dockerfile OlivierDehaene 2023-03-29 14:54:36 +0200
  • 930842a7f0 hack OlivierDehaene 2023-02-28 16:47:00 +0100
  • 32199274f3 update docker image Philipp Schmid 2023-02-28 16:35:53 +0100
  • 7bea9a105e change env var names Philipp Schmid 2023-02-28 14:00:32 +0100
  • ce09fd32a1 sagemaker support OlivierDehaene 2023-02-28 10:34:29 +0100
  • 1c5d526943 improvements OlivierDehaene 2023-03-29 14:01:23 +0200
  • 383619bd7f v1 OlivierDehaene 2023-03-29 11:58:19 +0200
  • 681744b982 add shutdown logic OlivierDehaene 2023-03-28 11:13:14 +0200
  • c0d793d2ca wip OlivierDehaene 2023-03-27 16:40:07 +0200
  • a28a8ebdb5 wip OlivierDehaene 2023-03-27 15:57:23 +0200
  • 4dfa6fbb62 wip OlivierDehaene 2023-03-12 10:05:33 +0100
  • c9bdaa8b73
    feat(server): reduce mlp and attn in one op for flash neox (#145) OlivierDehaene 2023-03-28 16:51:41 +0200
  • d2a42095dd feat(server): reduce mlp and attn in one op for flash neox OlivierDehaene 2023-03-28 16:33:00 +0200
  • f000068944
    feat(server): clear cache on error (#143) OlivierDehaene 2023-03-28 11:29:35 +0200
  • f0278520f1 feat(server): clear cache on error OlivierDehaene 2023-03-28 10:26:51 +0200
  • f786d1ddf5 Bit more simplification to flash_neox generate_tokens() Nick Hill 2023-03-27 16:30:11 -0700
  • 8e8dd984d8
    feat(server): Add mypy-protobuf (#141) Nick Hill 2023-03-27 00:25:15 -0700
  • 462530c2b0
    fix(server): Avoid using try/except to determine kind of AutoModel (#142) Nick Hill 2023-03-27 00:23:22 -0700
  • 552b611e0e fix(server): Avoid using try/except to determine kind of AutoModel Nick Hill 2023-03-26 17:08:50 -0700
  • 6ec75cda7a feat(server): Add mypy-protobuf Nick Hill 2023-03-26 16:55:56 -0700
  • 9895569c8b Fix tests mod in queue.rs Nick Hill 2023-03-25 22:01:05 -0700
  • f934d01f74 proposal: Move token decoding and stopping evaluation to router Nick Hill 2023-03-23 11:37:12 -0700
  • ab5fd8cf93
    v0.4.1 (#140) v0.4.1 OlivierDehaene 2023-03-26 16:37:51 +0200
  • 678b2f3900
    feat(server): cleanup flash neox loading (#139) OlivierDehaene 2023-03-26 16:37:21 +0200
  • bd54f376c9 v0.4.1 OlivierDehaene 2023-03-26 16:28:33 +0200
  • a9526d9072 feat(server): cleanup flash neox loading OlivierDehaene 2023-03-26 16:15:42 +0200
  • d6a93fe992
    fix(server): fix flash-neox scores warping (#137) OlivierDehaene 2023-03-24 18:21:41 +0100
  • 15a6b79c7e turn tp embed back on OlivierDehaene 2023-03-24 18:20:47 +0100
  • fc778e46fb fix(server): fix flash-neox scores warping OlivierDehaene 2023-03-24 18:10:10 +0100
  • 05e9a796cc
    feat(server): flash neoX (#133) OlivierDehaene 2023-03-24 14:02:14 +0100
  • eeaabd6eaa add env var OlivierDehaene 2023-03-24 11:36:45 +0100
  • 23e1028822
    feat(python-client): add CI (#136) OlivierDehaene 2023-03-23 18:13:04 +0100
  • 8c4b80e6e8 feat(python-client): add CI OlivierDehaene 2023-03-23 18:11:12 +0100
  • 5d04525cb9
    feat(python-client): release v0.4.0 (#135) OlivierDehaene 2023-03-23 18:07:20 +0100
  • 0fec9fbfd1 feat(python-client): release v0.4.0 OlivierDehaene 2023-03-23 18:07:02 +0100
  • 5e5e9d4bbd
    feat: Add note about NVIDIA drivers (#64) lewtun 2023-03-23 18:03:45 +0100
  • c07acd4fea
    Merge branch 'main' into lewtun-patch-1 OlivierDehaene 2023-03-23 18:03:33 +0100