Commit Graph

  • da7e104241 Add the option to force another dtype than f16. Nicolas Patry 2023-06-30 07:47:06 +0000
  • ddfc02f2a4 add warmup OlivierDehaene 2023-06-29 15:50:44 +0200
  • 1175d0e4b4
    First upgrade pip then run pip Yang, Bo 2023-06-28 11:07:40 -0700
  • d649cd8e02 wip OlivierDehaene 2023-06-28 19:26:26 +0200
  • d6bb10f202 Map deduplicated tensors via metadata Vincent Brouwers 2023-06-28 17:18:01 +0000
  • 70f485bf9f
    feat(router): add header option to disable buffering for the generate_stream response (#498) Robert Kimball 2023-06-28 02:50:12 -0700
  • 83de8a5ef9 Add header option to disable buffering for the response stream. Robert Kimball 2023-06-27 17:48:26 +0000
  • ae466a8736
    fix(server): Do not init process group if already initialized (#388) Antoni Baum 2023-06-26 03:32:54 -0700
  • 90f651250e
    Merge branch 'main' into dist_is_initialized OlivierDehaene 2023-06-26 12:30:17 +0200
  • aefde28b45
    feat(server): Add inference support for GPTQ (llama + falcon tested) + Quantization script (#438) Nicolas Patry 2023-06-26 12:27:01 +0200
  • a4fd6905d8 fmt feat/better_tokens OlivierDehaene 2023-06-23 15:01:05 +0200
  • bd3a9d8e85
    fix(router): add timeout on flume sends (#488) OlivierDehaene 2023-06-23 14:58:28 +0200
  • 776d150c55
    feat(server): Adding new ignore_rule for conversion. (#485) Nicolas Patry 2023-06-23 12:41:13 +0200
  • 49b4b33e80
    feat(server): Update convert logic. (#483) Nicolas Patry 2023-06-23 12:40:46 +0200
  • 465ed07939 fix(router): add timeout on flume sends OlivierDehaene 2023-06-23 12:38:52 +0200
  • 09e4e73f6e
    Adding new ignore_rule for conversion. Nicolas Patry 2023-06-22 21:37:10 +0200
  • f282d1bdbc Fixing changed names for santacoder. Ubuntu 2023-06-22 14:15:53 +0000
  • 34eadb54e9 Changing convert logic. Nicolas Patry 2023-06-22 14:52:47 +0200
  • 2a13c833b5
    Print error logs from workers during integration tests Yang, Bo 2023-06-20 14:44:42 -0700
  • 83e442ca9a feat(server): use encoding to get prefill tokens OlivierDehaene 2023-06-20 18:29:55 +0200
  • c9c65ab323
    fix(server): Fixing T5 in case the names are mixed up. (#475) Nicolas Patry 2023-06-20 18:03:36 +0200
  • a8aa688a7b
    Apply suggestions from code review Nicolas Patry 2023-06-20 16:16:45 +0200
  • 5573f229c8 Fixing T5 in case the names are mixed up. Ubuntu 2023-06-20 14:03:29 +0000
  • 53aa9194c8
    fix(server): fix warpers on CPU (#472) OlivierDehaene 2023-06-20 11:06:10 +0200
  • 82c9fadefe fix(server): fix warpers on CPU OlivierDehaene 2023-06-19 17:44:53 +0200
  • dca0fe2585 Adding GPTQ integration tests. add_integration_test Ubuntu 2023-06-19 12:14:17 +0000
  • ece7ffa40a
    feat(server): improve flash attention import errors (#465) OlivierDehaene 2023-06-19 09:53:45 +0200
  • 9c77ff23a7 feat(server): improve flash attention import errors OlivierDehaene 2023-06-16 16:56:08 +0200
  • f59fb8b630
    feat(router): add ngrok integration (#453) OlivierDehaene 2023-06-16 16:25:11 +0200
  • ce9119a2ab add cargo feature OlivierDehaene 2023-06-16 11:31:41 +0200
  • 17837b1e51 Adding docs about GPTQ usage. add_gptq_docs Nicolas Patry 2023-06-15 19:41:04 +0200
  • 16d0fb04ae Santacoder GPTQ support (quantized model seems awful, not sure if it's prompting or the quantization itself). Nicolas Patry 2023-06-15 16:59:31 +0200
  • 59bd95f805 Adding some help for the options in text-generation-benchmark. Nicolas Patry 2023-06-15 16:26:10 +0200
  • 51b2d4edca feat(router): add ngrok integration OlivierDehaene 2023-06-14 18:06:33 +0200
  • 983c813f1d Typo. Nicolas Patry 2023-06-14 16:57:56 +0200
  • 054a3d095c Triton is actually a dependency of torch on linux. Nicolas Patry 2023-06-14 15:03:17 +0200
  • 732da6942b Remove lots of dead code, move triton to hard requirement Nicolas Patry 2023-06-14 14:55:45 +0200
  • 5de6863756 No one saw that, therefore it didn't happen. Nicolas Patry 2023-06-14 12:23:30 +0200
  • 55cf4d257c Tiny fixes for falcon. Nicolas Patry 2023-06-14 09:29:44 +0200
  • e5e552b496 Falcon Nicolas Patry 2023-06-14 00:08:33 +0200
  • ee1f94e64b Fixing register bias + gptq_bits type. Ubuntu 2023-06-13 21:45:23 +0000
  • ffe8fc4699 Fixing few things Ubuntu 2023-06-13 18:58:09 +0000
  • dadbbc27d5 Neox. Ubuntu 2023-06-13 16:05:53 +0000
  • 3fb8979a6d Re-enabling dim=dim in TensorParallelColumn because llama. Ubuntu 2023-06-13 15:37:52 +0000
  • ae308f88ec Some fixes. Ubuntu 2023-06-13 14:08:37 +0000
  • a0a194c391 Functionning quantization script. Ubuntu 2023-06-13 11:45:08 +0000
  • 5a72715344 Adding quantization scripts. Ubuntu 2023-06-12 15:57:32 +0000
  • da8ebf16fe Typo. Nicolas Patry 2023-06-12 11:50:08 +0200
  • 0b5859213e Fixing the dockerfile (require triton + gcc for compiling). Ubuntu 2023-06-09 16:25:14 +0000
  • 92f85c964d Removing dead code. Ubuntu 2023-06-09 15:59:49 +0000
  • 9a12941bef [WIP] Inference support for GPTQ (llama at least) Ubuntu 2023-06-09 15:48:13 +0000
  • 5ce89059f8
    feat(server): pre-allocate past key values for flash causal LM (#412) OlivierDehaene 2023-06-12 18:30:29 +0200
  • 4b9ebb0a85 faster OlivierDehaene 2023-06-12 16:25:23 +0200
  • ca650e5bff
    fix(makefile): Fix typo and use POSIX comparison in the makefile (#443) sayf eddine hammemi 2023-06-12 15:24:53 +0200
  • f550e49fcc
    Update Makefile OlivierDehaene 2023-06-12 15:24:23 +0200
  • d0eaafdea6 Fix typo and use Posix comparison piratos 2023-06-12 14:39:21 +0200
  • d4eb60f48d
    docs(launcher): fix CUDA_VISIBLE_DEVICES helper comment (#441) A.J 2023-06-12 13:59:22 +0200
  • 49e3ee70b6 (TYPO): CUDA_VISIBLE_DEVICES comment antonio 2023-06-12 13:05:10 +0200
  • e496c9ba5b
    feat(server): optimize dist ops (#434) OlivierDehaene 2023-06-09 11:55:29 +0200
  • 92a74ea036 revert some changes OlivierDehaene 2023-06-05 18:54:23 +0200
  • afdfe43346 update commit OlivierDehaene 2023-06-05 17:37:03 +0200
  • c509e4e79d add other models OlivierDehaene 2023-06-05 16:55:53 +0200
  • 3fc87f93bd fix OlivierDehaene 2023-06-02 18:17:18 +0200
  • bfd6928c3e working OlivierDehaene 2023-06-01 18:37:14 +0200
  • c9e7471742 working rw 7b OlivierDehaene 2023-06-01 13:32:48 +0200
  • 5ff2dc9176 wip OlivierDehaene 2023-06-01 10:05:24 +0200
  • bf3bef9582 black OlivierDehaene 2023-06-08 19:29:36 +0200
  • b05ec96b0e remove quant OlivierDehaene 2023-06-08 19:29:22 +0200
  • 219be4f488 remove OlivierDehaene 2023-06-08 16:29:31 +0200
  • bdd811dd83 support input.shape 3 OlivierDehaene 2023-06-08 16:29:01 +0200
  • b67405bd8e feat(server): opt dist ops OlivierDehaene 2023-06-08 16:25:24 +0200
  • abd58ff82c
    feat(server): Rework model loading (#344) Nicolas Patry 2023-06-08 14:51:52 +0200
  • f245aa0c57 warn on unused snapshot OlivierDehaene 2023-06-08 14:15:01 +0200
  • c66648d920 add CARGO_REGISTRIES_CRATES_IO_PROTOCOL OlivierDehaene 2023-06-08 11:58:21 +0200
  • b027f5f129 black + cleanup OlivierDehaene 2023-06-08 11:47:59 +0200
  • 5e0a6ea1b7 skip instead of comment OlivierDehaene 2023-06-08 11:12:34 +0200
  • 4170de1b37 Last fixes hopefully. Ubuntu 2023-06-08 08:18:11 +0000
  • f3388d290f Just ditch the non flash integration tests. They work, but seem to mess the CI. Ubuntu 2023-06-07 14:28:17 +0000
  • cc84387877 Fixing Falcon 40b Nicolas Patry 2023-06-07 16:17:06 +0200
  • 5c82dcd2bf
    Update server/text_generation_server/models/custom_modeling/flash_rw_modeling.py Nicolas Patry 2023-06-07 15:00:20 +0200
  • b8bfb2a91e Manual fixes. Ubuntu 2023-06-07 12:56:04 +0000
  • 6ddcd1582c
    Apply suggestions from code review Nicolas Patry 2023-06-07 14:59:29 +0200
  • c6ac50e42b Removing flash attention env Ubuntu 2023-06-07 09:22:13 +0000
  • c5995652b0 Fix regular flash Ubuntu 2023-06-07 07:52:15 +0000
  • 877d4d4aeb Adding integration for neox NON flash. Ubuntu 2023-06-06 16:19:27 +0000
  • 644e0a65a3 Updating starcoder Ubuntu 2023-06-06 14:05:02 +0000
  • fb0840944c Reducing number of reps while autotuning. quantization Ubuntu 2023-06-06 11:56:10 +0000
  • daf59b0582 Large attention ? Ubuntu 2023-06-06 11:08:25 +0000
  • d083d57d0d Fixing flash rw. Ubuntu 2023-06-06 10:45:59 +0000
  • 2a1ecf3863 Fix rebase. Nicolas Patry 2023-06-06 11:20:53 +0200
  • 7fa79f02ca Fix logic. Ubuntu 2023-05-25 09:42:59 +0000
  • 4e071bf2f1 Fix PositionalRotary loads. Ubuntu 2023-05-25 09:34:31 +0000
  • 165bb4b6c0 Green ? Ubuntu 2023-05-25 08:45:41 +0000
  • c471e46cf8 M******** Ubuntu 2023-05-24 20:28:54 +0000
  • 55045be42f Neox (non flash) port + kernel. Ubuntu 2023-05-24 13:07:12 +0000
  • e36e42a3f4 T5? Ubuntu 2023-05-24 11:53:09 +0000
  • 680f26d6b2 Typo. Ubuntu 2023-05-24 10:10:16 +0000
  • 5c2a0e4555 Missing import. Ubuntu 2023-05-24 09:46:46 +0000
  • 2362a80a4f Black + ruff + T5 w0 quant. Ubuntu 2023-05-24 09:35:29 +0000
  • 15bf3d4944 Fused all commits for saner rebase.. Nicolas Patry 2023-05-19 12:01:57 +0200