Commit Graph

  • 19ea85f8dc
    Updating the flake. (#2404) Nicolas Patry 2024-08-12 18:09:16 +0200
  • 93c061ac79
    Updating the flake. Nicolas Patry 2024-08-12 17:53:26 +0200
  • b84bb19ece fix: prefer recent gptq changes fix-release-tests drbh 2024-08-12 15:51:19 +0000
  • 7e773b0f20 fix: superseed gptq changes with main drbh 2024-08-12 15:18:02 +0000
  • 3f12750a18 fix: marlin repeat scale for fp8 and bump snapshots drbh 2024-08-09 16:39:16 +0000
  • df9eb38733 fix: include correct exllama methods based on version drbh 2024-08-08 20:42:41 +0000
  • e99dd84b9a fix: move GPTQWeight into file to avoid circular import drbh 2024-08-08 19:52:23 +0000
  • 700e64c5b9 fix: update mamba snap and run other release tests drbh 2024-08-08 17:55:24 +0000
  • add7908f0f fix: update mt0, mamba and grammar tests drbh 2024-08-08 15:17:45 +0000
  • c3e358e8b5 fix: update deepseek and gemma tests drbh 2024-08-08 14:29:28 +0000
  • 57efa7ab8f fix: run bloom in non release and update snapshots drbh 2024-08-08 13:47:49 +0000
  • 30395b09f4
    fix: improve completions to send a final chunk with usage details (#2336) drbh 2024-08-12 11:26:11 -0400
  • 4c3f8a70a1
    fix: allocate tmp based on sgmv kernel if available (#2345) drbh 2024-08-12 11:24:32 -0400
  • d5d168a4d2 test throughput Xuan Son Nguyen 2024-08-12 17:23:11 +0200
  • 155f9c98e2
    feat: validate template variables before apply and improve sliding wi… (#2403) drbh 2024-08-12 10:58:40 -0400
  • 298efa41c5 fix: improve missing template var test drbh 2024-08-12 14:35:42 +0000
  • 2551456fff feat: validate template variables before apply and improve sliding window check drbh 2024-08-12 14:16:09 +0000
  • 136bcc8128
    Keeping the benchmark somewhere (#2401) Nicolas Patry 2024-08-12 15:22:02 +0200
  • bf6d60a07b
    Keeping the benchmark somewhere Daniël de Kok 2024-08-06 12:36:15 +0000
  • 8deeaca4ff
    Add support for prefix caching to the v3 router (#2392) Daniël de Kok 2024-08-12 14:59:17 +0200
  • b6bb1d5160
    Cpu dockerimage (#2367) Wang, Yi 2024-08-12 20:10:30 +0800
  • 84bc3d7b7d
    Fixing import exl2 (#2399) Nicolas Patry 2024-08-12 14:08:59 +0200
  • 730fa00e20
    Adding launcher to build. (#2397) Nicolas Patry 2024-08-12 14:08:46 +0200
  • 9c739651cd
    Upgrade fbgemm (#2398) Nicolas Patry 2024-08-12 14:08:38 +0200
  • 5f002c678f
    Fixing import exl2 Nicolas Patry 2024-08-12 12:23:46 +0200
  • 7fde42c6e8
    Fix fbgemm version Nicolas Patry 2024-08-12 11:50:02 +0200
  • b727e0aedc only 10 VUs Xuan Son Nguyen 2024-08-12 11:45:30 +0200
  • fc1853adac
    Upgrade fbgemm Nicolas Patry 2024-08-12 11:29:13 +0200
  • 7e694bbab7
    Adding launcher to build. Nicolas Patry 2024-08-12 09:42:58 +0200
  • 01a515dea2
    nix: add router to the devshell (#2396) Daniël de Kok 2024-08-12 09:28:38 +0200
  • ccf4995744 nix: add router to the devshell Daniël de Kok 2024-08-12 06:41:39 +0000
  • d403575c43
    Make bf16 default for hpu, fix script (#205) Abhilash Majumder 2024-08-11 14:18:35 +0530
  • cf2ff5a1dd
    Revert PR#178 (#191) Sun Choi 2024-08-11 00:29:30 -0700
  • 535335f088
    fix(router): Fix appending to message content Simone Rossi 2024-08-10 17:53:59 +0200
  • a41e974c3b
    Merge branch 'habana-main' into v2.0.4 regisss 2024-08-10 12:54:00 +0200
  • 7bc16deb48 wip: debug gemma and flash explore-t4-gemma-issues drbh 2024-08-09 23:08:54 +0000
  • 8dcc7d3f6b
    Update flake for 9.0a capability in Torch (#2394) Daniël de Kok 2024-08-09 22:36:51 +0200
  • 7a6a6e5cc2 Update flake for 9.0a capability in Torch Daniël de Kok 2024-08-09 20:30:15 +0000
  • 7825c0744a fix: update openapi schema drbh 2024-08-09 20:02:47 +0000
  • e57b6cccda fix: remove dev debug trait and unneeded mut drbh 2024-08-09 15:48:55 -0400
  • 515cd66705 fix: include finish reason string drbh 2024-07-30 21:07:42 +0000
  • c330491223 fix: improve completions to send a final chunk with usage details drbh 2024-07-30 21:02:32 +0000
  • 7101bf2993 fix: re add copy build artifacts step for punica kernels drbh 2024-08-09 16:53:25 +0000
  • 88ac607f1c
    Moving the docs. Nicolas Patry 2024-08-09 18:15:12 +0200
  • 8140b2294f
    Bad rebase Nicolas Patry 2024-08-09 18:11:43 +0200
  • 173e5e6c4b
    fix: Message API link Hugo Larcher 2024-07-16 11:23:02 +0200
  • 3c912f40bb
    doc: Refactor API reference Hugo Larcher 2024-07-15 20:22:22 +0200
  • 5e70943ed0
    doc: Add API reference Hugo Larcher 2024-07-15 18:16:22 +0200
  • fc2d1134b8
    doc: Add metrics documentation and add a 'Reference' section Hugo Larcher 2024-07-15 14:15:55 +0200
  • 5746a8d0c3 Add support for prefix caching to the v3 router Daniël de Kok 2024-08-09 14:54:13 +0000
  • 0d06aed02d
    feat: add guideline to chat request and template (#2391) drbh 2024-08-09 10:56:45 -0400
  • 7735b385dc Prefix caching WIP feature/radix-prefix-cache Daniël de Kok 2024-08-09 11:47:14 +0000
  • 7a48a84784
    Using an enum for flash backens (paged/flashdecoding/flashinfer) (#2385) Nicolas Patry 2024-08-09 16:41:17 +0200
  • d94b0fcf52 fix: add template test and update docs drbh 2024-08-09 14:16:35 +0000
  • 3b25cd3213 feat: add guideline to chat request and template drbh 2024-08-09 13:53:47 +0000
  • 6e127dcc96
    flake: use rust-overlay (#2390) Daniël de Kok 2024-08-09 15:24:21 +0200
  • f2c5fb6cbe flake: use rust-overlay Daniël de Kok 2024-08-09 13:02:57 +0000
  • 9f039ad4b3 flake: use rust-overlay nix/cargo-clippy Daniël de Kok 2024-08-09 13:02:57 +0000
  • b2b9c42724
    Update documentation for Supported models (#2386) Vaibhav Srivastav 2024-08-09 15:01:34 +0200
  • 9aea62b381
    Merge branch 'huggingface:main' into vb/followup-doc-fixes Vaibhav Srivastav 2024-08-09 14:57:04 +0200
  • 977534bcb8
    flake: add fmt and clippy (#2389) Daniël de Kok 2024-08-09 14:56:20 +0200
  • a4b1806557
    Fix clippy and fmt. Nicolas Patry 2024-08-09 14:54:52 +0200
  • 6ee7e2e208 flake: add fmt and clippy Daniël de Kok 2024-08-09 12:53:14 +0000
  • c9813b935b Other minor updates. Vaibhav Srivastav 2024-08-09 14:49:04 +0200
  • 379e1659a9
    Clippy. Nicolas Patry 2024-08-09 14:39:49 +0200
  • d84b98b40f
    Early exit on server too. Nicolas Patry 2024-08-09 12:47:39 +0200
  • 6bcad66c6e
    Using an enum for flash backens (paged/flashdecoding/flashinfer) Nicolas Patry 2024-08-09 12:31:08 +0200
  • 952b450a3b
    Using HF_HOME instead of CACHE to get token read in addition to models. (#2288) Nicolas Patry 2024-08-09 14:25:44 +0200
  • 27daf69ea8
    Merge branch 'huggingface:main' into vb/followup-doc-fixes Vaibhav Srivastav 2024-08-09 14:08:37 +0200
  • cd1e2cd2cf add docker load_tests Xuan Son Nguyen 2024-08-09 13:16:49 +0200
  • 03bfff5a01 up. Vaibhav Srivastav 2024-08-09 12:35:22 +0200
  • c6d5039cd7
    Add experimental flake (#2384) Daniël de Kok 2024-08-09 12:32:37 +0200
  • 2bd9129f11 Minor doc fixes Vaibhav Srivastav 2024-08-09 12:29:20 +0200
  • 5b8218fbef Add flake.nix Daniël de Kok 2024-08-09 10:22:17 +0000
  • 7830de1566
    Add FlashInfer support (#2354) Daniël de Kok 2024-08-09 11:42:00 +0200
  • bad8ade7ae
    Using HF_HOME instead of CACHE to get token read in addition to models. Nicolas Patry 2024-07-23 15:42:55 +0000
  • 6d06473cf4
    Pr 2352 ci branch (#2382) drbh 2024-08-09 04:54:32 -0400
  • cb3ae30284
    Update Quantization docs and minor doc fix. (#2368) Vaibhav Srivastav 2024-08-08 22:06:57 +0200
  • 383975995b up Vaibhav Srivastav 2024-08-08 19:56:18 +0000
  • f852190060
    fix: prefer hidden_activation over hidden_act in gemma2 (#2381) drbh 2024-08-08 14:08:56 -0400
  • bec657973d fix: update v3 scheduler and ensure max_batch_size > 0 drbh 2024-08-08 17:47:26 +0000
  • 0781053d3a fix: prefer hidden_activation over hidden_act in gemma2 drbh 2024-08-08 12:58:05 -0400
  • 2ca5980634
    Pr 2337 ci branch (#2379) drbh 2024-08-08 12:30:29 -0400
  • 6497ae61e2 Merge commit 'refs/pull/2352/head' of github.com:huggingface/text-generation-inference into pr-2352-ci-branch drbh 2024-08-08 16:27:00 +0000
  • 689b1abbf6
    fix EleutherAI/gpt-neox-20b does not work in tgi (#2346) Wang, Yi 2024-08-09 00:08:52 +0800
  • b921a46dc0 Merge commit 'refs/pull/2337/head' of github.com:huggingface/text-generation-inference into pr-2337-ci-branch drbh 2024-08-08 15:21:57 +0000
  • 82d19d7723
    Pr 2374 ci branch (#2378) drbh 2024-08-08 11:14:06 -0400
  • e1268596bc fix: syntax/style tweak drbh 2024-08-08 14:10:51 +0000
  • e36a9c57f0
    Code expects newer huggingface_hub versions, tested and this resolves issues with streaming response format (#190) geoffrey papilion 2024-08-08 04:07:27 -0700
  • 256a97231b
    Removed redundant and crash causing regions to be a subject to Torch compile (#194) Jacek Czaja 2024-08-08 13:06:20 +0200
  • d7c5ef6cd2
    Update __init__.py Praz 2024-08-08 13:50:02 +0530
  • bd4b23d0ba
    Update __init__.py Praz 2024-08-08 13:43:01 +0530
  • a379d5536b
    Fix the prefix for OPT model in opt_modelling.py #2370 (CI RUN) (#2371) drbh 2024-08-07 23:14:02 -0400
  • f98aaeeb27 fix: small syntax tweak drbh 2024-08-08 02:10:03 +0000
  • e01e1b7ca6 fix: run lints drbh 2024-08-08 01:35:42 +0000
  • 21267f3ca3
    add gptj modeling in TGI #2366 (CI RUN) (#2372) drbh 2024-08-07 21:32:37 -0400
  • e219397ee1 fix: adjust syntax typo again pr-2366-ci-branch drbh 2024-08-08 00:31:24 +0000
  • ce30a14139 fix: adjust syntax typo drbh 2024-08-08 00:03:12 +0000
  • 7372a0dc38 fix: update docs for model addition drbh 2024-08-07 23:47:37 +0000
  • 8094ecfc9e
    fix: fix num_ln_in_parallel_attn attribute name typo in RWConfig (#2350) almersawi 2024-08-08 03:45:23 +0400