Commit Graph

  • e30f4f61e7 Only return top_tokens field when requested Vincent Brouwers 2023-08-02 13:03:19 +0000
  • 65c7b6207c Add max_top_n_tokens CLI argument Vincent Brouwers 2023-08-02 12:42:59 +0000
  • 8471e1862d Defer building top-token objects to Rust Vincent Brouwers 2023-08-01 15:02:30 +0000
  • 730d86f1d0 Skip top-n tokens in prefill Vincent Brouwers 2023-08-01 13:55:38 +0000
  • 564bc99a7b
    fix toc Merve Noyan 2023-08-01 14:13:28 +0300
  • 470dcdfe7b
    Separated querying section and emphasized self generating docs Merve Noyan 2023-08-01 14:10:45 +0300
  • 21ca70e0eb
    Added supported models and hardware Merve Noyan 2023-08-01 14:02:14 +0300
  • 2675d934e5
    Update local_launch.md Merve Noyan 2023-08-01 12:44:25 +0300
  • cdc7db9af9 add FastLinear import p_spozzhang 2023-08-01 13:56:52 +0800
  • a9aa187d84
    Update requirements.txt Sven Schultze 2023-07-31 21:54:17 +0200
  • 7766fee9b1
    fix typo for dynamic rotary (#745) compat_logger Florian Zimmermeister 2023-07-31 18:58:46 +0200
  • d3d8f1bd6b
    Typo fix. (#746) Nicolas Patry 2023-07-31 18:57:29 +0200
  • 8a774ab07e
    Update layers.py Nicolas Patry 2023-07-31 18:57:14 +0200
  • a4415e0465
    fix typo for dynamic rotary Florian Zimmermeister 2023-07-31 18:55:54 +0200
  • 15fc64668f
    fix(server): Failing quantize config after local read. (#743) Nicolas Patry 2023-07-31 17:51:26 +0200
  • 7a2136eeb8 fix(server): Failing quantize config after local read. Nicolas Patry 2023-07-31 15:49:06 +0000
  • c86dcbeeb1
    Update build_pr_documentation.yml Merve Noyan 2023-07-31 18:16:29 +0300
  • d65bbb333d
    Update build_pr_documentation.yml Merve Noyan 2023-07-31 18:13:32 +0300
  • b2268272ad
    Added installation and launch notes and re-structured toc Merve Noyan 2023-07-31 17:35:36 +0300
  • 2a13f1a046
    chore: fix typo in mpt_modeling.py (#737) Ikko Eltociear Ashimine 2023-07-31 22:43:44 +0900
  • 932bdd93ff
    Adding Rope scaling. (#741) Nicolas Patry 2023-07-31 15:38:47 +0200
  • d16298b8d4 Allocate top_n_token tensor in Batch Vincent Brouwers 2023-07-31 13:09:45 +0000
  • 41bd0e4af1
    Added index.md and other initial files Merve Noyan 2023-07-31 15:56:29 +0300
  • aa44a3d1f0 Cargo fmt. Nicolas Patry 2023-07-31 12:42:18 +0000
  • b9633c46d0
    Fix typing in Model.generate_token (#733) Jae-Won Chung 2023-07-31 08:35:14 -0400
  • edbba4ea36 Adding Rope scaling. Nicolas Patry 2023-07-31 11:55:44 +0000
  • dc631b5be5
    Setup for doc-builder and added TOC Merve Noyan 2023-07-31 14:18:20 +0300
  • 92bb56b0c1
    Local gptq support. (#738) Nicolas Patry 2023-07-31 10:32:52 +0200
  • 760fdcfe7b Upgrading rust version. Nicolas Patry 2023-07-31 10:08:15 +0200
  • 66cea49d57 Cargo fmt Nicolas Patry 2023-07-31 09:57:18 +0200
  • 4b3e24f843 feat(server): Add bitsandbytes 4bit quantization (#626) krzim 2023-07-21 03:53:05 -0400
  • f29e3d7d34 Local gptq support. Nicolas Patry 2023-07-31 09:51:58 +0200
  • a1cec743ee
    chore: fix typo in mpt_modeling.py Ikko Eltociear Ashimine 2023-07-31 11:52:53 +0900
  • bdc76134a3 LICENSE change michaelfeil 2023-07-30 12:22:49 +0200
  • 5a465fa40a Fix typing in Model.generate_token Jae-Won Chung 2023-07-28 17:23:41 -0400
  • 3ef5ffbc64
    v1.0.0 (#727) v1.0.0 OlivierDehaene 2023-07-28 17:43:46 +0200
  • 51203e4087 revert vllm change OlivierDehaene 2023-07-28 16:51:33 +0200
  • 95d0fba7de Return more top-n-tokens when probabilities are equal Vincent Brouwers 2023-07-28 14:21:11 +0000
  • 19dc7d31b9 update README OlivierDehaene 2023-07-28 16:26:04 +0200
  • eca8817183 udpate README OlivierDehaene 2023-07-28 16:21:59 +0200
  • 23446d15db v1.0.0 OlivierDehaene 2023-07-28 16:14:30 +0200
  • bde25e62b3
    chore: update license to HFOIL (#725) OlivierDehaene 2023-07-28 15:59:46 +0200
  • ce90be833b chore: update license to HFOIL OlivierDehaene 2023-07-28 15:56:36 +0200
  • afd04dc71e
    feat(server): update vllm version (#723) OlivierDehaene 2023-07-28 15:36:38 +0200
  • dc59fd90ff feat(server): update vllm version OlivierDehaene 2023-07-28 14:47:46 +0200
  • f848decee6
    docs: Add hardware section to TOC in README (#721) regisss 2023-07-28 11:20:03 +0200
  • 8e47e17ada Add section to TOC in README regisss 2023-07-28 10:27:05 +0200
  • 5a1cccbb98
    Add section about TGI on other AI hardware accelerators in README (#715) regisss 2023-07-28 09:14:03 +0200
  • 987b0fff3a
    Load quantize_config.json from local path Antoni Baum 2023-07-27 18:03:04 -0700
  • eba543222b Add section about TGI on Gaudi in README regisss 2023-07-27 22:40:53 +0200
  • 9f18f4c006
    v0.9.4 (#713) v0.9.4 OlivierDehaene 2023-07-27 19:25:15 +0200
  • e366dfa0f0 v0.9.4 OlivierDehaene 2023-07-27 18:49:03 +0200
  • ab96b9aec3
    feat(server): support new falcon config (#712) OlivierDehaene 2023-07-27 18:38:57 +0200
  • e8b0a014a0 feat(server): support new falcon config OlivierDehaene 2023-07-27 17:50:42 +0200
  • 2efd46ef95 fix(server): fix missing datasets in quantize OlivierDehaene 2023-07-27 14:50:45 +0200
  • 8bd0adb135
    fix(server): fix quantization python requirements (#708) OlivierDehaene 2023-07-27 12:28:10 +0200
  • 0754eaaf17 fix(server): fix quantization python requirements OlivierDehaene 2023-07-27 12:03:47 +0200
  • 50d05fa20d Implement top-n-tokens for all models Vincent Brouwers 2023-07-26 15:12:57 +0000
  • 494e6b1c61 Share computation for top-n-token decoding Vincent Brouwers 2023-07-25 14:55:32 +0000
  • f809f179dc Add batched top-n-tokens to FlashCausalLM Vincent Brouwers 2023-07-25 14:17:25 +0000
  • a7be416c87 Add top-n-tokens support to benchmark Vincent Brouwers 2023-07-24 14:02:56 +0000
  • 7c014c7dfe Add WIP support for returning top tokens Vincent Brouwers 2023-07-14 19:48:15 +0000
  • d8a955740f
    Remove extra workflows orangetin 2023-07-26 20:01:36 -0700
  • 3f031ad51f
    Fix workflow orangetin 2023-07-26 19:58:44 -0700
  • 369ad020f4
    Ignore flash-attention Yang, Bo 2023-07-26 11:37:01 -0700
  • 9dc53886c3
    Ignore external projects Yang, Bo 2023-07-25 11:46:57 -0700
  • e64a65891b docs(README): update readme OlivierDehaene 2023-07-25 19:45:25 +0200
  • a0d55358d2
    feat(server): Using quantize_config.json instead of GPTQ_BITS env variables. (#671) Nicolas Patry 2023-07-25 12:00:27 +0100
  • 0635d0e245 After rebase. Nicolas Patry 2023-07-25 09:14:47 +0200
  • 95583ee257 Small fix. Nicolas Patry 2023-07-21 10:20:01 +0000
  • c07ee68b60 feat(server): Using quantize_config.json instead of GPTQ_BITS env variables. Nicolas Patry 2023-07-21 10:12:28 +0000
  • 79b4620107 adding suggested changes for os.environ vars and reporting correct torch.dtype on API michaelfeil 2023-07-25 09:05:08 +0200
  • e38cda5b9b apply suggested changes michaelfeil 2023-07-25 08:32:55 +0200
  • 9bb64c92a9 Add AutoCausalLM Yang, Bo 2023-07-12 01:07:10 +0000
  • be6c9acf46 cpu speedup kwargs michaelfeil 2023-07-24 23:13:32 +0200
  • 37df6df38e
    fix(server): fix exllama buffers (#689) OlivierDehaene 2023-07-24 14:25:43 +0200
  • 74c87f5888 fmt OlivierDehaene 2023-07-24 13:59:10 +0200
  • a6057c4076 fix(server): fix exllama buffers OlivierDehaene 2023-07-24 10:41:24 +0200
  • 73a4d65d26
    feat: add cuda memory fraction (#659) OlivierDehaene 2023-07-24 11:43:58 +0200
  • 336ea37637 fix other issue and make code pass on cpu. michaelfeil 2023-07-24 11:03:02 +0200
  • 31f45f6351 memory fraction on free memory OlivierDehaene 2023-07-24 10:25:05 +0200
  • 1b59f8da73 feat: add cuda memory fraction OlivierDehaene 2023-07-20 11:29:48 +0200
  • b2575fd18d adapt trust remote code michaelfeil 2023-07-23 21:18:00 +0200
  • 9b382f9f4a reformat code and imports michaelfeil 2023-07-23 12:20:07 +0200
  • ccc7b7ab8f Cleanup Antoni Baum 2023-07-22 17:12:37 -0700
  • d583f962f8 WIP Antoni Baum 2023-07-22 16:03:07 -0700
  • 74c31ee890 improve error handling michaelfeil 2023-07-22 23:50:38 +0200
  • 7338e0097f add requirements to docker michaelfeil 2023-07-22 23:42:30 +0200
  • 3f2fce87e7 reformatted code michaelfeil 2023-07-22 21:54:31 +0200
  • 2da14fcb2a initial commit for running ctranslate2 michaelfeil 2023-07-22 21:34:48 +0200
  • 1da642bd0e feat(server): add local prom and health routes if running w/ ngrok OlivierDehaene 2023-07-21 16:56:30 +0200
  • 15b3e9ffb0
    Directly load GPTBigCode to specified device (#618) Yang, Bo 2023-07-21 02:27:31 -0700
  • d5b5bc750f
    feat(server): Add exllama GPTQ CUDA kernel support #553 (#666) Nicolas Patry 2023-07-21 10:59:00 +0200
  • afb39404e1 Getting closer to the non gptq test (stop sequence doesn't work). Nicolas Patry 2023-07-21 08:15:25 +0000
  • 8b6a262539 Switching model for integration test llama gptq. Nicolas Patry 2023-07-21 07:29:32 +0000
  • 1dc952a674 Wtf gh. Nicolas Patry 2023-07-21 06:26:46 +0000
  • 40be532841 Update starcoder_gptq Nicolas Patry 2023-07-21 06:00:02 +0000
  • 3ec3adde2f Separate build process. Nicolas Patry 2023-07-20 22:09:31 +0000
  • c6e702fb2f Add kernel target. Nicolas Patry 2023-07-20 20:24:44 +0000
  • 12191b7e42 Fix config. Nicolas Patry 2023-07-20 19:56:31 +0000