Commit Graph

  • 84b73ec9ac chore: Refresh dependency versions Nick Hill 2022-12-27 16:46:25 -0800
  • fcc2c5fcbf
    feat(launcher): Log server stdout (#19) OlivierDehaene 2023-01-05 12:01:23 +0100
  • cb2c649560 feat(router): Support return_full_text OlivierDehaene 2023-01-03 17:13:11 +0100
  • da0623ec1e feat(launcher): Log server stdout OlivierDehaene 2023-01-03 16:34:27 +0100
  • b94f30215f
    fix(server): Use cleanup_tokenization_spaces=False for lossless decoding (#13) Nicolas Patry 2023-01-03 11:07:05 +0100
  • 60472f9d2b
    feat(router): Add const parameters to validation logic (#15) Nick Hill 2023-01-03 01:41:22 -0800
  • bed5634ead
    Merge branch 'main' into fix_replaying_requests Nicolas Patry 2023-01-02 10:55:21 +0100
  • 894957daff Change const back to u32 Nick Hill 2022-12-31 09:00:50 -0800
  • b2acd1b15e Change const to usize Nick Hill 2022-12-31 08:54:17 -0800
  • 337e1f8795 Improve logging from python shards Nick Hill 2022-12-27 16:19:10 -0800
  • d002ab3eb1 Simplify some logic in db.rs, validation.rs Nick Hill 2022-12-27 15:52:45 -0800
  • 3efa5bbbfd
    fix(router): Include special tokens when tokenizing (#14) Nick Hill 2022-12-30 10:31:44 -0800
  • 686cc66717
    fix(server): Check for device type correctly when determining initial padding (#16) Nick Hill 2022-12-30 10:30:42 -0800
  • 63258dfab8 fix: Check for device type correctly when determining initial padding Nick Hill 2022-12-27 16:08:59 -0800
  • 03a62635b2 fix: Include special tokens when tokenizing in front-end Nick Hill 2022-12-27 13:27:45 -0800
  • d8e1ce669b Easiest fix. Nicolas Patry 2022-12-20 15:24:42 +0100
  • 611e21cb13
    fix(server): Fix stop sequences (#11) OlivierDehaene 2022-12-16 16:03:39 +0100
  • 5a0a947556 increase waiting time OlivierDehaene 2022-12-16 15:48:42 +0100
  • 415a231ff9 fix(server): Fix stop sequences OlivierDehaene 2022-12-16 13:11:18 +0100
  • 3e2e6240b8
    feat(launcher): Add integration tests (#9) OlivierDehaene 2022-12-16 11:29:36 +0100
  • 9978147004 add expected values OlivierDehaene 2022-12-16 11:04:37 +0100
  • 7ab7c9a01f add paths to launch action OlivierDehaene 2022-12-16 09:51:47 +0100
  • 75ab65fb31 feat(launcher): Add integration tests OlivierDehaene 2022-12-15 19:28:55 +0100
  • 32a253063d
    feat: Return logprobs (#8) OlivierDehaene 2022-12-15 17:03:56 +0100
  • 82706a651c update readme OlivierDehaene 2022-12-15 17:03:37 +0100
  • 291722cb48 remove print OlivierDehaene 2022-12-15 16:59:37 +0100
  • 7e063117fe install pytest OlivierDehaene 2022-12-15 16:54:39 +0100
  • 271a066fb9 remove unused line OlivierDehaene 2022-12-15 16:51:40 +0100
  • cd75eddd30 fix makefile OlivierDehaene 2022-12-15 16:50:49 +0100
  • c90afea3de format OlivierDehaene 2022-12-15 16:50:11 +0100
  • 8420ee8fa2 feat: Return logprobs OlivierDehaene 2022-12-15 16:33:01 +0100
  • 718096f695
    feat: Support stop sequences (#7) OlivierDehaene 2022-12-12 18:25:22 +0100
  • 42ae103559 fix test utils OlivierDehaene 2022-12-12 17:46:39 +0100
  • 3eaf7ee2e3 Add reason to response OlivierDehaene 2022-12-12 17:44:37 +0100
  • ed8ecb7ab5 feat: Support stop sequences OlivierDehaene 2022-12-12 17:24:17 +0100
  • 042180d88f fix(server): Only pad to multiple of 8 on GPUs OlivierDehaene 2022-12-08 19:37:37 +0100
  • a2985036aa
    feat(server): Add model tests (#6) OlivierDehaene 2022-12-08 18:49:33 +0100
  • 0cf4302c61 remove unused var OlivierDehaene 2022-12-08 18:48:16 +0100
  • a3cea9c119 remove workflow OlivierDehaene 2022-12-08 18:45:43 +0100
  • 5f85220158 add workflow OlivierDehaene 2022-12-08 18:42:25 +0100
  • 5db40a5109 black OlivierDehaene 2022-12-08 18:36:17 +0100
  • 3229fb7b44 feat(server): Add model tests OlivierDehaene 2022-12-08 18:32:07 +0100
  • 31d76e238d
    fix(batching): Avoid theoretical hang in batcher loop (#5) Nick Hill 2022-12-05 01:10:59 -0800
  • cb89376932
    use notify_waiters to better express intent OlivierDehaene 2022-12-05 10:09:37 +0100
  • aaf38e5978 Change to use notify_one, adjust waiting_tokens accounting Nick Hill 2022-12-04 12:49:49 -0800
  • 1963f0e1bb Fix top_p validation message Nick Hill 2022-12-04 11:33:00 -0800
  • 1747365e25 Remove unneeded Model.num_heads field Nick Hill 2022-12-04 11:31:08 -0800
  • a172430d8b fix: Some small fixes Nick Hill 2022-11-29 13:22:25 -0800
  • daa1d81d5e
    feat(server): Support Galactica (#4) OlivierDehaene 2022-12-01 19:31:54 +0100
  • 4aab0617b6 fix Readme OlivierDehaene 2022-12-01 18:59:47 +0100
  • bc4c6a406a fix galactica batching OlivierDehaene 2022-12-01 18:54:53 +0100
  • a4782da22b black OlivierDehaene 2022-11-18 17:11:10 +0100
  • 1c5365ce85 feat(server): Support Galactica OlivierDehaene 2022-11-18 17:10:44 +0100
  • d6d5b12e03 fix(router): Handle tokenizer errors OlivierDehaene 2022-11-14 17:15:19 +0100
  • feb7806ca4 fix(readme): Typo OlivierDehaene 2022-11-14 16:22:10 +0100
  • 91f5f86280 fix(router): Fix HTTP status codes OlivierDehaene 2022-11-14 14:34:15 +0100
  • 6c781025ae feat(rust): Update to 1.65 OlivierDehaene 2022-11-14 13:59:56 +0100
  • dccd5c2b1a feat(server): Clarify CausalLMBatch concatenate method OlivierDehaene 2022-11-09 18:24:07 +0100
  • fa43fb71be fix(server): Fix Transformers fork version OlivierDehaene 2022-11-08 17:42:38 +0100
  • 4236e41b0d feat(server): Improved doc OlivierDehaene 2022-11-07 12:53:56 +0100
  • cea6051eff feat(launcher): Pass CUDA_VISIBLE_DEVICES to the shard OlivierDehaene 2022-11-04 18:31:08 +0100
  • 427d7cc444 feat(server): Support AutoModelForSeq2SeqLM OlivierDehaene 2022-11-04 18:03:04 +0100
  • c5665f5c8b feat(server): Support generic AutoModelForCausalLM OlivierDehaene 2022-11-04 14:22:47 +0100
  • 755fc0e403 fix(models): Revert buggy support for AutoModel OlivierDehaene 2022-11-03 16:07:54 +0100
  • b3b7ea0d74 feat: Use json formatter by default in docker image OlivierDehaene 2022-11-02 17:29:56 +0100
  • 3cf6368c77 feat(server): Support all AutoModelForCausalLM on a best effort basis OlivierDehaene 2022-10-28 19:24:00 +0200
  • 09674e6df9 feat(server): Support bitsandbytes OlivierDehaene 2022-10-27 14:25:29 +0200
  • beb552127a feat(client): Simplify sharded logic OlivierDehaene 2022-10-22 23:40:05 +0200
  • c8ce9b2515
    feat(server): Use safetensors Nicolas Patry 2022-10-22 20:00:15 +0200
  • 75adbb3441 feat(weights): Support safetensors OlivierDehaene 2022-10-22 19:46:05 +0200
  • be8827fe41
    Create LICENSE (#2) Thomas Wang 2022-10-22 10:44:52 +0200
  • 3398211873
    Create LICENSE Thomas Wang 2022-10-21 23:15:02 +0200
  • 604b18bec2
    Reworked follwoing https://github.com/huggingface/transformers_bloom_parallel/pull/7 Nicolas Patry 2022-10-21 20:47:57 +0200
  • 457c9038ff
    Making bloom loadable with safetensors. Nicolas Patry 2022-10-21 18:02:04 +0200
  • c837893370 feat(router): Add max_waiting_tokens OlivierDehaene 2022-10-21 16:40:05 +0200
  • 895a341d06 fix(validation): Fix error messages OlivierDehaene 2022-10-21 10:59:15 +0200
  • f16f2f5ae1 v0.1.0 Olivier Dehaene 2022-10-18 15:19:03 +0200
  • 92c1ecd008 feat: Add arguments to CLI Olivier Dehaene 2022-10-17 18:27:33 +0200
  • 5e5d8766a2 feat: Improve error handling Olivier Dehaene 2022-10-17 14:59:00 +0200
  • 00e6ce44b1 Update aml deployment Olivier Dehaene 2022-10-17 10:39:59 +0200
  • bcb53903b8 feat: Add AML deployment Olivier Dehaene 2022-10-15 20:21:50 +0200
  • bf99afe916 feat: Docker image Olivier Dehaene 2022-10-14 15:56:21 +0200
  • f11965c11d support deepspeed feat/support_deepspeed Olivier Dehaene 2022-10-13 11:05:44 +0200
  • 39df4d9975 Use axum Olivier Dehaene 2022-10-11 18:14:39 +0200
  • e86ecbac63 ValidationError was not correctly handled Olivier Dehaene 2022-10-11 16:53:40 +0200
  • 4c693e6524 Refactored gRPC interface Added validation logic Olivier Dehaene 2022-10-11 16:50:54 +0200
  • fa9a088467 Add load testing Olivier Dehaene 2022-10-11 10:36:51 +0200
  • 1d986983d5 fix: cleanup Olivier Dehaene 2022-10-08 12:34:25 +0200
  • 295831a481 Init Olivier Dehaene 2022-10-08 12:30:12 +0200