Commit Graph

  • 19c41824cb chore: update openapi schema OlivierDehaene 2023-06-05 18:16:08 +0200
  • 6abec14a7e
    feat(server): batch tokenization for flash causal lm (#411) OlivierDehaene 2023-06-05 16:09:41 +0200
  • e09314a72f black OlivierDehaene 2023-06-05 15:34:54 +0200
  • 89c5621ecf feat(server): batch tokenization for flash causal lm OlivierDehaene 2023-06-05 14:15:01 +0200
  • f6ba71f60f [python] fix: Fix embedding mapping for deepspeed chat hyunwoongko 2023-06-03 12:00:07 +0900
  • 895c5f1562
    feat(server): only compute prefill logprobs when asked (#406) OlivierDehaene 2023-06-02 17:12:30 +0200
  • f1e054c80a damnit python OlivierDehaene 2023-06-02 16:45:56 +0200
  • cdc005bcb0 rename var OlivierDehaene 2023-06-02 16:36:32 +0200
  • 62ff3816fb fix tests OlivierDehaene 2023-06-02 15:58:47 +0200
  • 1dd0cf63df feat(server): only compute prefill logprobs when asked OlivierDehaene 2023-06-02 15:30:35 +0200
  • 83b84486ad
    feat(launcher): parse oom signal (#404) OlivierDehaene 2023-06-02 14:17:27 +0200
  • d7d2619213 feat(launcher): parse oom signal OlivierDehaene 2023-06-02 11:06:26 +0200
  • 62fc401030
    feat(sagemaker): add trust remote code to entrypoint (#394) OlivierDehaene 2023-06-02 09:51:06 +0200
  • e7248fe90e v0.8.2 v0.8.2 OlivierDehaene 2023-06-01 19:49:13 +0200
  • 4cd0a9f0c8 feat(server): use torch.select to decrease cpu bottleneck OlivierDehaene 2023-06-01 19:16:37 +0200
  • 95d3546976
    feat(server): load santacoder/starcoder models with safetensors (#393) OlivierDehaene 2023-06-01 12:10:35 +0200
  • c0928e6f26
    feat(server): remove trust_remote_code requirement for falcon models (#396) OlivierDehaene 2023-06-01 12:07:41 +0200
  • 246e8f8250 add lm_head OlivierDehaene 2023-06-01 11:46:51 +0200
  • d69a0633be
    fix(server): fix has_position_ids (#395) OlivierDehaene 2023-06-01 11:41:35 +0200
  • 08c2477569 feat(server): remove trust_remote_code requirement for falcon models OlivierDehaene 2023-06-01 11:40:51 +0200
  • f652788d54 fix OlivierDehaene 2023-06-01 11:22:19 +0200
  • 4202d97001 fix(server): fix has_position_ids OlivierDehaene 2023-06-01 11:03:51 +0200
  • 71d215f0bd feat(sagemaker): add trust remote code to entrypoint OlivierDehaene 2023-06-01 11:00:05 +0200
  • f6438ac352 feat(server): load santacoder/starcoder models with safetensors OlivierDehaene 2023-06-01 10:55:26 +0200
  • 1809159aff Do not init process group if already initialized Antoni Baum 2023-05-31 22:17:01 +0000
  • db2ebe3947 v0.8.1 v0.8.1 OlivierDehaene 2023-05-31 12:08:40 +0200
  • 337afb2842
    fix(server): fix bnb quantization for CausalLM models (#385) OlivierDehaene 2023-05-31 11:48:28 +0200
  • e5b9de587a fix(server): fix bnb quantization for CausalLM models OlivierDehaene 2023-05-31 11:17:08 +0200
  • 87dc034b59
    feat(server): add retry on download (#384) OlivierDehaene 2023-05-31 10:57:53 +0200
  • 444400b457 increase health checks OlivierDehaene 2023-05-31 10:55:59 +0200
  • 3a323202e6 update snapshot OlivierDehaene 2023-05-31 10:21:53 +0200
  • 2d44035b82 feat(server): add retry on download OlivierDehaene 2023-05-31 10:21:04 +0200
  • 081b926584 v0.8.0 v0.8.0 OlivierDehaene 2023-05-30 18:39:35 +0200
  • b8b950b37c
    feat(server): support RefinedWeb models (#379) OlivierDehaene 2023-05-30 18:25:19 +0200
  • 5e813b9b5c a10 snapshots OlivierDehaene 2023-05-30 17:57:02 +0200
  • c7b899a438 black OlivierDehaene 2023-05-30 17:09:51 +0200
  • a2f437a291 fuse reshapes OlivierDehaene 2023-05-30 17:09:34 +0200
  • 8f28011e1e add integration tests OlivierDehaene 2023-05-30 15:53:20 +0200
  • 3e517bfc9d Merge remote-tracking branch 'origin/main' into feat/rw OlivierDehaene 2023-05-30 15:24:44 +0200
  • 51b5db77f5 black OlivierDehaene 2023-05-30 15:24:21 +0200
  • 8c8d709994 40b working OlivierDehaene 2023-05-30 15:09:49 +0200
  • bbb1d9e704 working OlivierDehaene 2023-05-30 14:45:31 +0200
  • bf7f1d5434 fix(server): fix quantization OlivierDehaene 2023-05-30 13:56:03 +0200
  • 49a6c8c1b2 fix(launcher): parse num cuda devices from CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES OlivierDehaene 2023-05-30 13:27:48 +0200
  • 146e72c3be fix(launcher): parse num cuda devices from CUDA_VISIBLE_DEVICES and NVIDIA_VISIBLE_DEVICES OlivierDehaene 2023-05-30 12:52:18 +0200
  • 73cf93f1ee Merge branch 'main' of github.com:huggingface/text-generation-inference OlivierDehaene 2023-05-30 12:40:20 +0200
  • cbffddcc06 wip OlivierDehaene 2023-05-30 10:41:10 +0200
  • 12ab24ae64 fix normal att OlivierDehaene 2023-05-29 12:10:17 +0200
  • 63a18c1414 feat(server): support RefinedWeb models OlivierDehaene 2023-05-29 11:56:19 +0200
  • 7a94845eba
    Fixes, cpu-optimized model, misc Joel Lamy-Poirier 2023-05-26 15:22:20 -0400
  • 5fde8d9991
    Fix issue when load AutoModelForSeq2SeqLM model (#370) CL-Shang 2023-05-26 18:31:47 +0800
  • 62f91f78ac
    feat(server): support vectorized warpers in flash causal lm (#317) OlivierDehaene 2023-05-26 12:30:27 +0200
  • e8fd0e4841 remove cuda graphs OlivierDehaene 2023-05-26 11:52:13 +0200
  • 72eefa3612
    fix Joel Lamy-Poirier 2023-05-25 15:52:14 -0400
  • a515fbde4c
    Fixes and format Joel Lamy-Poirier 2023-05-25 15:08:52 -0400
  • 7e53903ca4 add shared pool OlivierDehaene 2023-05-25 18:26:41 +0200
  • b973c101c5 remove unused vars OlivierDehaene 2023-05-25 18:00:57 +0200
  • d3cb0d3b83 faster cumsum OlivierDehaene 2023-05-25 17:59:13 +0200
  • 208e380ea8 Fix issue when load AutoModelForSeq2SeqLM model CL-Shang 2023-05-25 15:12:05 +0000
  • 951930fbff
    feat(benchmarker): add summary tables (#368) OlivierDehaene 2023-05-25 13:38:36 +0200
  • d13773e9db feat(benchmarker): add summary tables Nicolas Patry 2023-04-26 20:23:54 +0200
  • 0921fe6a2a
    Optimized model Joel Lamy-Poirier 2023-05-24 19:51:53 -0400
  • 7de104b7f6 revert OlivierDehaene 2023-05-24 19:40:26 +0200
  • a794c677ae fix warping OlivierDehaene 2023-05-24 19:37:42 +0200
  • caa9608347 fix tests OlivierDehaene 2023-05-24 17:02:20 +0200
  • a62f14872e optimize argmax OlivierDehaene 2023-05-24 16:28:16 +0200
  • c59fb353a0 add watermarking OlivierDehaene 2023-05-24 16:23:46 +0200
  • b9ad3acc4e clean dtype OlivierDehaene 2023-05-12 15:53:56 +0200
  • e7826855a3 fix imports OlivierDehaene 2023-05-12 15:47:57 +0200
  • f9e3a3bb91 feat(server): support vectorized warpers in flash causal lm OlivierDehaene 2023-05-12 15:41:36 +0200
  • 218c9adaa5
    feat: decrease IPC proto size (#367) OlivierDehaene 2023-05-24 19:19:57 +0200
  • 5bfc8631ce fix tests OlivierDehaene 2023-05-24 18:56:26 +0200
  • 6012976445 feat: decrease IPC proto size OlivierDehaene 2023-05-24 18:01:19 +0200
  • d31562f300
    v0.7.0 (#353) v0.7.0 OlivierDehaene 2023-05-23 21:20:49 +0200
  • 1cd8033522 update readme OlivierDehaene 2023-05-23 20:59:23 +0200
  • d0f7ed6e06 revert auto_convert OlivierDehaene 2023-05-23 20:48:34 +0200
  • 822435f872 v0.7.0 OlivierDehaene 2023-05-22 19:07:28 +0200
  • 942005386a
    feat(router): log input/ouput at debug level (#364) OlivierDehaene 2023-05-23 20:47:37 +0200
  • e3e487dc71
    feat(server): support trust_remote_code (#363) OlivierDehaene 2023-05-23 20:40:39 +0200
  • cdad94e26e feat(router): log input/ouput at debug level OlivierDehaene 2023-05-23 20:39:09 +0200
  • ac59aadf17 inspect signature for position ids OlivierDehaene 2023-05-23 20:05:56 +0200
  • b83ea010fa do not update transformers OlivierDehaene 2023-05-23 19:35:07 +0200
  • de3491854b feat(server): support trust_remote_code OlivierDehaene 2023-05-23 19:23:01 +0200
  • e9669a4085
    feat(server): do not use device_map auto on single GPU (#362) OlivierDehaene 2023-05-23 19:12:12 +0200
  • be3f667e18 feat(server): do not use device_map auto on single GPU OlivierDehaene 2023-05-23 18:41:59 +0200
  • cfaa858070
    feat(server): support fp16 for t5 (#360) OlivierDehaene 2023-05-23 18:16:48 +0200
  • 94377efa78
    chore(sever): update requirements (#357) OlivierDehaene 2023-05-23 18:03:22 +0200
  • 5f67923cac
    feat: add nightly load testing (#358) OlivierDehaene 2023-05-23 17:42:19 +0200
  • 0a9b1ad729 chore(sever): update requirements OlivierDehaene 2023-05-23 12:41:55 +0200
  • 7898259c4f feat(server): support fp16 for t5 OlivierDehaene 2023-05-23 17:39:56 +0200
  • 0a6494785c
    fix(ci): fix security group (#359) oOraph 2023-05-23 16:49:11 +0200
  • 3fc35dea78
    Fix ci security group: open outbound rules Raphael 2023-05-23 16:24:37 +0200
  • 84b8005f85 move files OlivierDehaene 2023-05-23 15:56:25 +0200
  • 43b0229da0 run on host OlivierDehaene 2023-05-23 15:53:09 +0200
  • 49f27b6b0f feat: add nightly load testing OlivierDehaene 2023-05-23 15:36:16 +0200
  • 4f4c9c1665
    fix(server): t5 cannot run in f16 (#356) OlivierDehaene 2023-05-23 12:15:54 +0200
  • 7ef9aac063 fix(server): t5 cannot run in f16 OlivierDehaene 2023-05-23 12:15:05 +0200
  • 91d9beec90
    fix(server): fix init for flash causal lm (#352) OlivierDehaene 2023-05-22 15:05:32 +0200
  • e649bf9a55
    feat(server): Support BLOOMChat-176B (#348) (#351) OlivierDehaene 2023-05-22 13:36:00 +0200
  • ba8d403352 fix(server): fix init for flash causal lm OlivierDehaene 2023-05-22 13:13:36 +0200