Commit Graph

  • f59b0ef2e7 fix(server): Small tidy of code from recent changes Nick Hill 2023-04-27 08:12:08 +0100
  • c86998b369 chore(server): update transformers OlivierDehaene 2023-04-27 09:09:07 +0200
  • db2b4e0754
    feat(router): new healthcheck that skips the queue (#244) Nicolas Patry 2023-04-26 20:23:54 +0200
  • 018e87d78d clippy OlivierDehaene 2023-04-26 20:03:46 +0200
  • 6d8d5b6d1d fmt OlivierDehaene 2023-04-26 19:42:49 +0200
  • 20e0117e7c
    Merge branch 'main' into new_health_check OlivierDehaene 2023-04-26 19:20:41 +0200
  • e7503a4240 add store true when successful prefill/decode OlivierDehaene 2023-04-26 19:14:21 +0200
  • c4fb09f2ae
    feat(router): add tests to validation (#237) Nicolas Patry 2023-04-26 16:14:40 +0200
  • 3b2d1a2854 Adding AtomicBool to see if healthcheck should do a full roundtrip or not. Nicolas Patry 2023-04-26 15:29:03 +0200
  • 77758f603b
    chore(launcher): refactor logic (#242) Nicolas Patry 2023-04-26 14:43:36 +0200
  • 9c0c464983
    Update router/src/queue.rs Nicolas Patry 2023-04-26 14:40:28 +0200
  • 8bb7f993b7
    Update router/src/queue.rs Nicolas Patry 2023-04-26 14:39:27 +0200
  • 9d613d0f9b Updating code. Nicolas Patry 2023-04-26 14:25:55 +0200
  • a963495315 add logic to queue feat/improve_max_tokens OlivierDehaene 2023-04-26 13:40:20 +0200
  • 4f460e5bfe feat(server): improve max tokens calculation OlivierDehaene 2023-04-26 13:07:25 +0200
  • e28b5bf460 Checking our device. Nicolas Patry 2023-04-26 12:19:32 +0200
  • e1867079fd New healthcheck that doesn't hit the queue. Nicolas Patry 2023-04-26 12:17:00 +0200
  • 7de8a377b0 fix(benchmarking): fix benchmarking tool OlivierDehaene 2023-04-26 00:54:27 +0200
  • 67356dc9a2
    Update ever so slightly current queue tests. Nicolas Patry 2023-04-25 17:07:53 +0200
  • 1c97d7b0c0
    Tmp. Nicolas Patry 2023-04-25 15:40:11 +0200
  • 390ec5aea8
    Cleanup up a bit the launcher. Nicolas Patry 2023-04-25 20:52:37 +0200
  • 9df67b35bf
    Refactoring launcher. Nicolas Patry 2023-04-25 20:40:37 +0200
  • 45344244cf
    Starting some routing tests. (#233) Nicolas Patry 2023-04-25 14:13:14 +0200
  • 323546df1d
    fix(python-client): add auth headers to is supported requests (#234) OlivierDehaene 2023-04-25 13:55:26 +0200
  • 37b64a5c10
    chore(server): update safetensors version (#235) OlivierDehaene 2023-04-25 13:50:56 +0200
  • 66b4744233 chore(server): update safetensors version OlivierDehaene 2023-04-25 13:14:32 +0200
  • d5c60b3273 Merge branch 'main' of github.com:huggingface/text-generation-inference OlivierDehaene 2023-04-25 13:12:34 +0200
  • 8b182eb986
    feat(router): add endpoint info to /info route (#228) OlivierDehaene 2023-04-25 13:11:18 +0200
  • 1ef5480217 fix(python-client): add auth headers to is supported requests OlivierDehaene 2023-04-25 13:09:25 +0200
  • 0def15ed68
    Starting some routing tests. Nicolas Patry 2023-04-25 12:31:23 +0200
  • 950dddaaa5 feat(router): add endpoint info to /info route OlivierDehaene 2023-04-24 18:23:07 +0200
  • ebc74d5666
    feat(router): use number of tokens in batch as input for dynamic batching (#226) OlivierDehaene 2023-04-24 17:59:00 +0200
  • 2a6f25ce51 revert build OlivierDehaene 2023-04-24 17:24:19 +0200
  • 61ff239724 refactor OlivierDehaene 2023-04-24 16:19:54 +0200
  • c3ad942e9f add metrics OlivierDehaene 2023-04-24 16:18:08 +0200
  • 889897fe69 black OlivierDehaene 2023-04-24 16:08:50 +0200
  • 885411e747 push image to test OlivierDehaene 2023-04-24 16:08:27 +0200
  • c69f24d16b feat(router): use number of tokens in batch as input for dynamic batching OlivierDehaene 2023-04-24 14:07:03 +0200
  • 98a3e0d135
    chore(server): update huggingface-hub (#227) OlivierDehaene 2023-04-24 15:57:13 +0200
  • 9b5e95f898 chore(server): update huggingface-hub OlivierDehaene 2023-04-24 15:20:26 +0200
  • 4a7dd4085a
    feat(server): reduce memory requirement (#214) Nick Hill 2023-04-24 05:15:42 -0700
  • 8e34beed32 equivalent changes for seq2seq_lm Nick Hill 2023-04-24 07:25:25 +0100
  • ab20142c14 trim to new max input length in filter() Nick Hill 2023-04-24 07:24:35 +0100
  • 0b1d0010a4 update unit tests Nick Hill 2023-04-24 06:52:46 +0100
  • 12326eff62 feat(server): reduce memory requirement Nick Hill 2023-04-20 14:54:01 -0700
  • 6ded76a4ae
    v0.6.0 (#222) v0.6.0 OlivierDehaene 2023-04-21 21:00:57 +0200
  • 97df0c7bc0
    misc: update to rust 1.69 (#221) OlivierDehaene 2023-04-21 21:00:30 +0200
  • 84a9fa33ed v0.6.0 OlivierDehaene 2023-04-21 20:47:46 +0200
  • 7d4b161019 misc: update to rust 1.69 OlivierDehaene 2023-04-21 20:34:16 +0200
  • 4b460e72fb
    fix(server): fix flash batch filtering (#220) OlivierDehaene 2023-04-21 20:26:01 +0200
  • 27b3a144f7 fix(server): fix flash batch filtering OlivierDehaene 2023-04-21 20:25:13 +0200
  • 1ffea36ec2
    fix(server): fix flash causal (#219) OlivierDehaene 2023-04-21 19:49:08 +0200
  • c0df99e704 fix(server): fix flash causal OlivierDehaene 2023-04-21 19:48:41 +0200
  • 86bca365df
    fix(server): fix flash causal (#218) OlivierDehaene 2023-04-21 19:42:16 +0200
  • 91c87b8013 fix(server): fix flash causal OlivierDehaene 2023-04-21 19:41:52 +0200
  • afc5b999d0
    fix(server): cleanup new flash past_key_values logic (#217) OlivierDehaene 2023-04-21 16:19:04 +0200
  • ebc2c2b7e4 fix(server): cleanup new flash past_key_values logic OlivierDehaene 2023-04-21 16:17:39 +0200
  • db4cb5e4ed
    fix(server): fix past key values logic (#216) OlivierDehaene 2023-04-21 15:59:18 +0200
  • 343437c7b5
    feat(router): add device and dtype info (#215) OlivierDehaene 2023-04-21 15:36:29 +0200
  • 02a5e9b742 fix(server): fix past key values logic OlivierDehaene 2023-04-21 15:36:08 +0200
  • ac8c0f6fe4
    feat(server): flash attention past key value optimizations (#213) Nick Hill 2023-04-21 05:57:18 -0700
  • f6bdde5337 feat(router): add device and dtype info OlivierDehaene 2023-04-21 14:46:35 +0200
  • 47fb2fb986 small thing missed during rebase; add another comment Nick Hill 2023-04-20 13:02:41 -0700
  • 41e0310ef7 tiny simplification Nick Hill 2023-04-20 11:27:20 -0700
  • e360cf92cf feat(server): flash attention past key value optimizations Nick Hill 2023-04-20 09:15:56 -0700
  • 274513e6a3
    fix(ci): fix sha in docker image (#212) OlivierDehaene 2023-04-20 18:50:47 +0200
  • ac1e52a15c fix(ci): fix sha in docker image OlivierDehaene 2023-04-20 18:50:23 +0200
  • ba1aae3e78 feat(router): dynamic batch sizing Nick Hill 2023-04-19 07:58:40 -0700
  • 709d8936f6
    feat(router): drop requests when client closes the channel (#202) OlivierDehaene 2023-04-20 11:07:40 +0200
  • 3652d82fd7 revert build OlivierDehaene 2023-04-19 20:00:19 +0200
  • 521f6203d1 add metrics OlivierDehaene 2023-04-19 18:39:44 +0200
  • ca98470cff push test image OlivierDehaene 2023-04-19 17:05:36 +0200
  • 94ff101fd3 make batch optional again OlivierDehaene 2023-04-19 16:25:49 +0200
  • 118f33d9dc fix queue OlivierDehaene 2023-04-19 16:24:18 +0200
  • d9578153cb fix tests for causal lm OlivierDehaene 2023-04-18 18:56:51 +0200
  • 2ad7a63761 wip OlivierDehaene 2023-04-18 17:51:41 +0200
  • 9476170dda wip OlivierDehaene 2023-04-16 18:52:47 +0200
  • 4e63d9cb28 wip OlivierDehaene 2023-04-14 12:33:44 +0200
  • b6ee0ec7b0
    feat(router): add git sha to info route (#208) OlivierDehaene 2023-04-19 21:36:59 +0200
  • 108f7c62e1 feat(router): add git sha to info route OlivierDehaene 2023-04-19 20:44:44 +0200
  • 252f42c1e6
    fix(router): add auth token to get model info (#207) OlivierDehaene 2023-04-19 20:06:06 +0200
  • 335a02ca72 fix(router): add auth token to get model info OlivierDehaene 2023-04-19 19:45:08 +0200
  • 6837b2eb77
    fix(docker): remove unused dependencies (#205) OlivierDehaene 2023-04-19 19:39:31 +0200
  • 97de3efb7e remove hashes OlivierDehaene 2023-04-19 19:15:14 +0200
  • 7954dd93a7 fix(docker): remove unused dependencies OlivierDehaene 2023-04-19 19:04:48 +0200
  • 5d27f5259b
    fix(server): fix hf_transfer issue with private repos (#203) OlivierDehaene 2023-04-19 17:36:16 +0200
  • d742b5d14c add git OlivierDehaene 2023-04-19 17:04:52 +0200
  • 3af1e4ff86 fix(server): fix hf_transfer issue with private repos OlivierDehaene 2023-04-19 16:34:11 +0200
  • a88c54bb4c
    feat(server): check cuda capability when importing flash models (#201) OlivierDehaene 2023-04-19 12:52:37 +0200
  • 1e41a53770 explicit OlivierDehaene 2023-04-19 12:52:24 +0200
  • e14ae3b5e9
    feat(server): support quantization for flash models (#200) OlivierDehaene 2023-04-19 12:51:11 +0200
  • 7c16352d1e feat(server): check cuda capability when importing flash models OlivierDehaene 2023-04-19 12:48:04 +0200
  • 0fc4f99379 fix santacoder sharded OlivierDehaene 2023-04-19 12:32:25 +0200
  • b47edb28af feat(server): support quantization for flash models OlivierDehaene 2023-04-19 11:57:34 +0200
  • 2475aede61
    feat(router): add info route (#196) OlivierDehaene 2023-04-18 16:16:06 +0200
  • e6bf4d40ec feat(router): add info route OlivierDehaene 2023-04-18 13:57:41 +0200
  • a07ef4c656 feat(server): avoid manipulating position_ids for non-applicable models Nick Hill 2023-04-17 16:48:01 -0700
  • 252a086e9b feat(server): have FlashGPTNeoXModel support HF accelerate Nick Hill 2023-04-17 16:29:14 -0700
  • b927244eb5
    feat(python-client): get list of currently deployed tgi models using the inference API (#191) OlivierDehaene 2023-04-17 18:43:24 +0200
  • dc99113c63 feat(python-client): get list of currently deployed tgi models using the inference API OlivierDehaene 2023-04-17 18:42:40 +0200