Commit Graph

  • 49c1224483 fix: list nested dist in workflow drbh 2024-05-21 19:11:00 +0000
  • 7aebbd2ef9 fix: debug cwd after build drbh 2024-05-21 18:35:43 +0000
  • a4cd403079 fix: move to parent dir after wheel build drbh 2024-05-21 18:15:57 +0000
  • f4e7cdb04a fix: limit build and tweak post build commands drbh 2024-05-21 17:43:07 +0000
  • 816e14f0b3 fix: avoid setup_release job drbh 2024-05-21 17:20:10 +0000
  • 15fa236bca fix: adjust workflow condition drbh 2024-05-21 17:17:59 +0000
  • 7d1651f4a2 feat: add workflow to build flash_attn drbh 2024-05-21 17:10:05 +0000
  • 38688ba45d fix: avoid library name collision and add core deps to build drbh 2024-05-20 22:29:13 +0000
  • 70b27c4b2a fix: exclude tgi from workspace and improve build drbh 2024-05-18 02:35:55 +0000
  • 30f4deba77 feat: bundle launcher and refactor cli wrappers drbh 2024-05-18 01:22:25 +0000
  • af2b2e8388 feat: package text-generation-server with tgi library drbh 2024-05-17 19:18:55 +0000
  • 72d69071ae fix: add missing imports post rebase drbh 2024-05-16 20:08:33 +0000
  • 0e5220d704 feat: experimental python packaging and interface drbh 2024-05-16 19:47:12 +0000
  • 612bc483b6
    Fixing the text part from tokenizer endpoint. (#1967) Nicolas Patry 2024-05-28 16:55:36 +0200
  • f20463e4e3 Fix (non-container) pytest stdout buffering-related lock-up Daniël de Kok 2024-05-28 07:25:14 +0000
  • 847536d9c7 Fixing the text part from tokenizer endpoint. Nicolas Patry 2024-05-28 13:30:50 +0000
  • e76b9824ae
    Upgrade to Axum 0.7 and Hyper 1.0 (Breaking change: disabled ngrok tunneling). (#1959) Nicolas Patry 2024-05-28 14:52:17 +0200
  • b8ac9d56df
    Update router/src/server.rs Nicolas Patry 2024-05-28 11:51:41 +0200
  • b759cc337b
    Apply suggestions from code review Nicolas Patry 2024-05-28 11:51:16 +0200
  • e8ef9ece32 Fix (non-container) pytest stdout buffering-related lock-up Daniël de Kok 2024-05-28 07:25:14 +0000
  • 1023de8048
    Add flash_attention argument options for Mistral (#145) Jimin Ha 2024-05-27 11:00:42 -0700
  • 7b368b7644 Fixing doc. Nicolas Patry 2024-05-27 16:35:51 +0000
  • a03cc02a73 Disabled ngrok for good. Nicolas Patry 2024-05-27 16:13:08 +0000
  • 26d3519ff2 Upgrading axum=0.7 Nicolas Patry 2024-02-02 12:15:32 +0100
  • b7ffa287f2
    fix small typo and broken link (#1958) Moritz Laurer 2024-05-27 17:31:06 +0200
  • 17a015c5f3 adding one sentence to make the term "grammar" less abstract moritzlaurer 2024-05-27 16:54:41 +0200
  • b820db1b4b small typo moritzlaurer 2024-05-27 16:51:49 +0200
  • 4405efac42 fix broken link moritzlaurer 2024-05-27 16:38:08 +0200
  • 0732b9d2f0
    Processor config chat template (#1954) drbh 2024-05-27 10:03:16 -0400
  • a401c83c35
    Fix GPTQ for models which do not have float16 at the default dtype (simpler) (#1953) Daniël de Kok 2024-05-27 14:41:28 +0200
  • b9b5051abc Fix GPTQ for models which do not have float16 at the default dtype Daniël de Kok 2024-05-24 19:01:07 +0000
  • 6f30a13afa Fix GPTQ for models which do not have float16 at the default dtype Daniël de Kok 2024-05-25 08:48:01 +0000
  • 9231098f3a Fix (flash) Gemma prefix and enable tests Daniël de Kok 2024-05-24 15:34:42 +0000
  • dd696891ac fix: adjust for idefics2 template drbh 2024-05-27 03:59:47 +0000
  • 93409ea038 feat: check processor_config chat template if not in tokenizer_config drbh 2024-05-26 23:49:29 -0400
  • 364b4497d0 Fix (flash) Gemma prefix and enable tests Daniël de Kok 2024-05-24 15:34:42 +0000
  • 3c74cf9cd4 Flashinfer test. flashinfer Nicolas Patry 2024-05-24 15:32:24 +0000
  • 01e4442ef6 REvert changes in modeling. Nicolas Patry 2024-05-24 14:18:00 +0000
  • 63e72033b7 Less intrusive. Nicolas Patry 2024-05-24 14:15:33 +0000
  • cacba5f21f Fix after rebase.. Nicolas Patry 2024-05-23 12:42:19 +0000
  • 1b86d0f31d Using flash decoding Nicolas Patry 2024-05-17 08:43:33 +0000
  • d32e33bd48
    Fix seeded output. (#1949) Nicolas Patry 2024-05-24 15:36:13 +0200
  • 97777ff059 Changing the tokenizer changed the seeded output. Nicolas Patry 2024-05-24 12:18:49 +0000
  • a55dfb2700 Upgrade deps after release. update_internal_version Nicolas Patry 2024-05-24 11:23:06 +0000
  • 8f22cb961a Modifing the version number. v2.0.4 git_2.0.4 Nicolas Patry 2024-05-24 10:52:28 +0000
  • cff472ba2b
    Fixing codellama loads by using purely AutoTokenizer. (#1947) Nicolas Patry 2024-05-24 12:40:39 +0200
  • ad0b36bd28 Fixing codellama loads by using purely AutoTokenizer. Nicolas Patry 2024-05-24 10:02:35 +0000
  • 8c437a80bc add kvcache fp8 support mohit@huggingface.co 2024-05-23 16:00:18 +0000
  • 954653466d
    Improving the logging system. (#1938) Nicolas Patry 2024-05-23 15:40:40 +0200
  • 629047cb82
    Add completion route to client and add stop parameter where it's missing (#1869) Thomas Schillaci 2024-05-23 15:37:09 +0200
  • f4a073ae6d
    Fixing some legacy behavior (big swapout of serverless on legacy stuff). (#1937) Nicolas Patry 2024-05-23 14:39:38 +0200
  • f41d644a90
    reenable xpu for tgi (#1939) Wang, Yi 2024-05-23 20:11:08 +0800
  • 6fe66e8261 Fmt. Nicolas Patry 2024-05-23 12:03:07 +0000
  • 3d35292907 reenable xpu for tgi Wang, Yi A 2024-05-23 04:50:07 -0700
  • 767eb8b0f1 Improving the logging system. Nicolas Patry 2024-05-23 11:48:07 +0000
  • c61013e6be Fix linting Thomas Schillaci 2024-05-23 13:31:28 +0200
  • 1c49da0c2d
    Update launcher/src/main.rs Nicolas Patry 2024-05-23 11:43:38 +0200
  • a103e3e9e2
    feat: add train medusa head tutorial (#1934) drbh 2024-05-23 05:34:18 -0400
  • ca0589f9e5
    Update docs/source/basic_tutorials/train_medusa.md Nicolas Patry 2024-05-23 11:33:51 +0200
  • da1b5fa6b7 Comment for the try except import. Nicolas Patry 2024-05-23 08:55:28 +0000
  • f48b6109fd Fixing legacy and CPU configs. Nicolas Patry 2024-05-23 08:52:28 +0000
  • efb73fcb59
    fix: use path inside of speculator config (#1935) drbh 2024-05-22 14:46:29 -0400
  • 91560ed931 fix: use path inside of speculator config drbh 2024-05-22 18:26:20 +0000
  • 33043f8255 fix: improve text and typos drbh 2024-05-22 12:22:28 -0400
  • a13a850542 feat: add train medusa head tutorial drbh 2024-05-22 15:21:19 +0000
  • 2f243a1a15
    Creating doc automatically for supported models. (#1929) Nicolas Patry 2024-05-22 16:22:57 +0200
  • b3be512efc WTF ? Nicolas Patry 2024-05-22 16:04:16 +0200
  • 18da570060 I'm dumb. Nicolas Patry 2024-05-22 15:42:17 +0200
  • 2890026c5a Ssh debugging... again.. Nicolas Patry 2024-05-22 15:38:11 +0200
  • 9e6d97e575 Fix. Nicolas Patry 2024-05-22 15:35:48 +0200
  • 9fd232fd00 Cleaner autodoc. Nicolas Patry 2024-05-22 15:25:48 +0200
  • 0c6e0bc8ee Different temporary filename. Nicolas Patry 2024-05-22 08:37:37 +0000
  • 7f546039ca Auto doc supported models. Nicolas Patry 2024-05-22 08:26:10 +0000
  • 1373c185c3 Creating doc automatically for supported models. Nicolas Patry 2024-05-21 15:48:01 +0200
  • fc0eaffc81
    feat: include token in client test like server tests (#1932) drbh 2024-05-22 03:58:26 -0400
  • be8356e399 feat: include token in client test like server tests drbh 2024-05-21 21:51:20 +0000
  • c753a989fe fix: update CI to avoid rate limiting pr-1869-ci-run drbh 2024-05-21 21:48:26 +0000
  • baf2adfb69 fix: run pre-commit drbh 2024-05-21 21:31:26 +0000
  • 9b08e4ab32 Merge commit 'refs/pull/1869/head' of github.com:huggingface/text-generation-inference into main drbh 2024-05-21 20:56:18 +0000
  • 904ff36917
    docs: Fix grafana dashboard url (#1925) Junlin Zhou 2024-05-22 01:12:14 +0800
  • 2eb2da5f02
    Revert "Dev/mask ldconfig output v2 (#1716)" (#144) Karol Damaszke 2024-05-21 14:55:09 +0200
  • 32acdd55b4
    Add grammar support (#140) Karol Damaszke 2024-05-20 11:16:34 +0200
  • 13665c5c6d docs: Fix grafana dashboard url Junlin Zhou 2024-05-20 12:50:34 +0800
  • 293b8125e7
    ROCm: make CK FA2 default instead of Triton (#1924) fxmarty 2024-05-20 02:44:48 +0200
  • cc515434bb update doc fxmarty 2024-05-19 17:39:53 -0700
  • 6c65632dcb make CK FA default fxmarty 2024-05-19 17:37:29 -0700
  • f871f114ca
    Fixing the download strategy for ibm-fms (#1917) Nicolas Patry 2024-05-18 13:31:24 +0200
  • 5dad0c0b29
    Fix TGI issues with ROCm (#1921) fxmarty 2024-05-17 19:50:52 +0200
  • 0d5c8977d7 temporarily disable integration tests fxmarty 2024-05-17 17:41:21 +0000
  • 7a5f5d9757 hotfix for quantization fxmarty 2024-05-17 17:18:40 +0000
  • f82ae76dff add back warning fxmarty 2024-05-17 16:51:31 +0000
  • c6565e8259 format fxmarty 2024-05-17 16:37:38 +0000
  • 6f3660de3b Very ugly code. Nicolas Patry 2024-05-17 15:59:23 +0000
  • d4b4c8d42e Another attempt. Nicolas Patry 2024-05-17 15:12:54 +0000
  • 52c9ff9aca Optional base_name_or_model_path. Nicolas Patry 2024-05-17 14:20:58 +0000
  • e5416274df Fixing the download strategy for ibm-fms Nicolas Patry 2024-05-17 10:33:00 +0000
  • b5f1c9de06
    Fix TunableOp bug (#1920) fxmarty 2024-05-17 18:21:51 +0200
  • 585860ef8b fix tunableop bugs fxmarty 2024-05-17 16:18:25 +0000
  • cd3c28cfe7 fix bug fix-cudagraph-bug fxmarty 2024-05-17 16:03:15 +0000
  • 422bf1f986
    Update grafana template (#1918) fxmarty 2024-05-17 17:37:23 +0200