Commit Graph

  • 67d687609b cleanup Felix Marty 2023-07-12 16:16:58 +0000
  • 67a46b7361 move exllama buffer init to the top level Felix Marty 2023-07-12 16:09:26 +0000
  • 4462854e1b have a single gptq quantization type Felix Marty 2023-07-12 15:43:20 +0000
  • 549df839d7 Tuple rather than list of exception types ssmi153 2023-07-12 23:26:09 +0800
  • 29ff597ef9 Merge branch 'main' of https://github.com/ssmi153/text-generation-inference ssmi153 2023-07-12 23:13:38 +0800
  • dc761f148d GPTQ env vars: Catch Runtime errors ssmi153 2023-07-12 23:10:35 +0800
  • f2f0289fb9 feat(server): empty cache on errors OlivierDehaene 2023-07-12 17:05:50 +0200
  • 073c1a884d
    Merge branch 'huggingface:main' into main ssmi153 2023-07-12 23:06:19 +0800
  • 67347950b7
    feat(server): Implements sharding for non divisible vocab_size. (#583) Nicolas Patry 2023-07-12 16:43:31 +0200
  • f588d32ea4 feat(launcher): add arg validation and drop subprocess OlivierDehaene 2023-07-12 16:38:30 +0200
  • b3f830abc3 Reworking the quantization script so it's still universal (not llama specific) Nicolas Patry 2023-07-11 17:25:26 +0000
  • f764bc1b52 Fixing OOM on non sharded. Nicolas Patry 2023-07-12 12:46:02 +0000
  • bfa3920aec BNB 4bits. bnb4 Nicolas Patry 2023-07-12 12:42:43 +0000
  • 2c4bf88268
    fix(server): Bug fixes for GPTQ_BITS environment variable passthrough (#590) ssmi153 2023-07-12 20:17:35 +0800
  • 636a4cca85 Bug fixes for GPTQ_BITS env var passthrough ssmi153 2023-07-12 17:25:24 +0800
  • 6193512c4b
    Update server/text_generation_server/utils/layers.py OlivierDehaene 2023-07-12 11:05:07 +0200
  • 63f03b4b7d Just don't shard LMHead if not divisible. Nicolas Patry 2023-07-12 09:03:16 +0000
  • 2e76727910 Doesn't affect LM_Head. Nicolas Patry 2023-07-11 12:51:13 +0000
  • 906027ae58 Enabling non divisble vocab_size. Nicolas Patry 2023-07-11 12:37:25 +0000
  • 7f9072228a
    fix(server): Adding logger import to t5_modeling.py (#585) Adam Kowalski 2023-07-12 03:40:32 -0500
  • db4efbf4bc
    fix(server): T5 weights names. (#582) enable_non_divisible_embeddings Nicolas Patry 2023-07-12 10:01:42 +0200
  • f063ebde10
    chore: migrate ci region for more availability. (#581) Nicolas Patry 2023-07-12 10:01:01 +0200
  • 5bd2ab6583
    feat(server): Support for env value for GPTQ_BITS and GPTQ_GROUPSIZE. (#580) Nicolas Patry 2023-07-12 10:00:02 +0200
  • f0181436f4
    fix(server): Fixing RW code (it's remote code so the Arch checking doesn't work to see which weights to keep). (#579) Nicolas Patry 2023-07-12 09:51:34 +0200
  • f5e8f73a1c
    Update server/text_generation_server/models/custom_modeling/flash_santacoder_modeling.py Nicolas Patry 2023-07-12 08:38:17 +0200
  • a1c23f3823
    Update layers.py Florian Zimmermeister 2023-07-11 18:47:50 +0200
  • 64accc59f1
    Update seq2seq_lm.py Florian Zimmermeister 2023-07-11 18:37:51 +0200
  • 780198b9e4
    Update santacoder.py Florian Zimmermeister 2023-07-11 18:37:09 +0200
  • 377c01e21e
    Update rw.py Florian Zimmermeister 2023-07-11 18:36:46 +0200
  • 198e6179ef
    Update causal_lm.py Florian Zimmermeister 2023-07-11 18:36:04 +0200
  • 0e048e4347
    Adding logger import to t5_modeling.py Adam Kowalski 2023-07-11 11:35:23 -0500
  • f2fae6db91
    Update requirements.txt bnb version Florian Zimmermeister 2023-07-11 18:33:32 +0200
  • 2a3f9cf5c2 Fix T5 weights names. Nicolas Patry 2023-07-11 12:06:01 +0000
  • 5562b510b3 Update closing runner. Nicolas Patry 2023-07-11 10:34:47 +0000
  • 3ef9d56847 migrate ci region for more availability (fingers crossed). Nicolas Patry 2023-07-11 10:12:24 +0000
  • 1b7b91a4d3 Support for env value for GPTQ_BITS and GPTQ_GROUPSIZE. Nicolas Patry 2023-07-11 10:33:29 +0200
  • d9ed7b9274 Fixing RW code (it's remote code so the Arch checking doesn't work to see which weights to keep). Nicolas Patry 2023-07-10 18:40:09 +0000
  • 1e62237d44 Adding additional response header X-Total-Tokens Julian Bright 2023-07-11 03:17:09 +1000
  • b4024edd45
    feat: better errors for warmup and TP (#575) OlivierDehaene 2023-07-10 14:47:15 +0200
  • 9d60030ba0 feat: better errors for warmup and TP OlivierDehaene 2023-07-10 12:43:44 +0200
  • 20ca9cf0c3 Memory fragmentation added for Causal LM ankit201 2023-07-09 03:35:47 +0000
  • 15de7c7ac3 DockerFile change Ankit Singh 2023-07-01 13:37:43 +0000
  • 5cdd242fec
    Update client.py : Adding missing arg "best_of" in generate_stream function yash bhaskar 2023-07-07 22:06:09 +0530
  • e943a294bc
    fix(server): harden the weights choice to save on disk. (#561) Nicolas Patry 2023-07-07 14:50:12 +0200
  • 193eae246c Update test. Nicolas Patry 2023-07-06 22:02:23 +0000
  • aae9d6faf7 Attempting to harden a bit the weights choice to save on disk. Nicolas Patry 2023-07-06 21:36:00 +0000
  • 31b36cca21
    v0.9.1 (#558) v0.9.1 OlivierDehaene 2023-07-06 16:05:42 +0200
  • f1f7674ae9 v0.9.1 OlivierDehaene 2023-07-06 16:03:53 +0200
  • c4bb5264ac
    fix(server): decrease memory fragmentation (#557) OlivierDehaene 2023-07-06 14:28:33 +0200
  • 39e37ec624 fix(server): decrease memory fragmentation OlivierDehaene 2023-07-06 13:07:08 +0200
  • a6e387404d try-catch to load the cuda extension, quite ugly practice tbh Felix Marty 2023-07-05 17:53:56 +0000
  • 620ed7d8aa Merge branch 'gptq-cuda-kernels' of https://github.com/fxmarty/text-generation-inference into gptq-cuda-kernels Felix Marty 2023-07-05 16:42:37 +0000
  • 2272b3a456 some more cleanup Felix Marty 2023-07-05 16:42:13 +0000
  • 0ff8219fdb Merge branch 'main' into gptq-cuda-kernels Félix Marty 2023-07-06 01:31:05 +0900
  • 6f42942772
    feat(router): add argument for hostname in router (#545) (#550) OlivierDehaene 2023-07-05 18:28:45 +0200
  • c858d791e5 add attribution Felix Marty 2023-07-05 16:15:10 +0000
  • ee7ba48b9a add exllama gptq kernel Felix Marty 2023-07-05 15:43:42 +0000
  • 22fc605f4e add hostname to launcher OlivierDehaene 2023-07-05 09:39:00 +0200
  • 57886c8fc4
    feat(router): add argument for hostname in router (#545) Phil Chen 2023-07-05 09:35:28 +0200
  • 0a468fdf7d Add argument for hostname in router Phil Chen 2023-07-05 00:49:53 +0200
  • 31e2253ae7
    feat(server): use latest flash attention commit (#543) OlivierDehaene 2023-07-04 20:23:55 +0200
  • e4b26aa10b
    fix(server): avoid errors for very small top_p values (#544) Nick Hill 2023-07-04 11:11:33 -0700
  • 8a7bfcd571 fix(server): avoid errors for very small top_p values Nick Hill 2023-07-04 10:59:40 -0700
  • ab860d371a feat(server): use latest flash attention commit OlivierDehaene 2023-07-04 19:33:49 +0200
  • 2a101207d4
    fix(server): Handle loading from local files for MPT (#534) Antoni Baum 2023-07-04 09:37:25 -0700
  • e6888d0e87
    docs(benchmarker): Adding some help for the options in text-generation-benchmark. (#462) Nicolas Patry 2023-07-04 18:35:37 +0200
  • 742199aa0d Modified fix. Nicolas Patry 2023-07-04 11:30:59 +0200
  • 81f234ec61 Revert "Map deduplicated tensors via metadata" Nicolas Patry 2023-07-04 11:30:35 +0200
  • 8405581fcd
    fix: Update server/Makefile to include Makefile-vllm (#520) Antoni Baum 2023-07-04 00:39:25 -0700
  • 5c490fb56a
    Handle loading from local files for MPT Antoni Baum 2023-07-03 12:19:54 -0700
  • 1da07e85aa
    feat(server): Add Non flash MPT. (#514) Nicolas Patry 2023-07-03 13:01:46 +0200
  • 2c30ff567e Remove comment. Nicolas Patry 2023-07-03 08:43:02 +0000
  • ed0c5bd1ed Removing commented things (raising proper errors instead). Nicolas Patry 2023-07-03 08:42:26 +0000
  • b591527a6c Einops. Nicolas Patry 2023-07-01 19:35:26 +0000
  • e28a809004
    v0.9.0 (#525) v0.9.0 OlivierDehaene 2023-07-01 19:25:41 +0200
  • da9c4655c3 fix launcher OlivierDehaene 2023-07-01 18:44:43 +0200
  • 5654537065 v0.9.0 OlivierDehaene 2023-07-01 17:50:03 +0200
  • 44561927e0 Adding integration tests snapshots. Nicolas Patry 2023-07-01 10:30:09 +0000
  • 24c0f1cc7a Adding (failing) integration tests. Nicolas Patry 2023-06-30 21:55:37 +0000
  • c62527a542 Fixed MPT sharding. Nicolas Patry 2023-06-30 21:46:44 +0000
  • f33ad7ed98 Non flash MPT. Nicolas Patry 2023-06-30 09:52:49 +0000
  • 2b53d71991
    fix(launcher): fix issue where launcher does not properly report shard failures (#522) OlivierDehaene 2023-06-30 23:09:20 +0200
  • 5ec19ef951 fix(launcher): fix issue where launcher does not properly report shard failures OlivierDehaene 2023-06-30 21:54:20 +0200
  • 51f2735f6c
    Ensure classmethods use cls instead of the class directly Antoni Baum 2023-06-30 11:47:42 -0700
  • 4656414977
    Update server/Makefile to include Makefile-vllm Antoni Baum 2023-06-30 11:43:45 -0700
  • ecf6dc3a5a
    feat: Add the option to force another dtype than f16. (#513) Nicolas Patry 2023-06-30 20:30:09 +0200
  • 3b0c979efc
    feat(router): arg validation (#519) OlivierDehaene 2023-06-30 20:07:49 +0200
  • ee5463a431 feat(router): arg validation OlivierDehaene 2023-06-30 19:45:35 +0200
  • e74bd41e0f
    feat(server): add paged attention to flash models (#516) OlivierDehaene 2023-06-30 19:09:59 +0200
  • b1831d5f97 double free OlivierDehaene 2023-06-30 17:43:40 +0200
  • 3c4243d627 fix drop OlivierDehaene 2023-06-30 16:52:25 +0200
  • 8ec0edcfe3 fix OlivierDehaene 2023-06-30 16:47:00 +0200
  • c52e84fe10 small refactor OlivierDehaene 2023-06-30 16:32:23 +0200
  • c5da6579dc flash neox is flaky OlivierDehaene 2023-06-30 14:06:44 +0200
  • 8a41ac8bb9 remove debug logging OlivierDehaene 2023-06-30 13:23:50 +0200
  • 16f796f735 add falcon, santacoder and neox support OlivierDehaene 2023-06-30 13:19:44 +0200
  • 02e43ccf6f FInal touches. Nicolas Patry 2023-06-30 08:39:15 +0000
  • 59474c29aa Fix cli name. Nicolas Patry 2023-06-30 08:08:13 +0000
  • 89e4015844 non modeling. Nicolas Patry 2023-06-30 07:52:36 +0000
  • 0a50ac31a7 Remove mpt. Nicolas Patry 2023-06-30 07:52:01 +0000