Commit Graph

  • 263d81a43a nix: add text-generation-benchmark to pure devshell Daniël de Kok 2024-08-18 15:16:16 +0000
  • f5f11b797e
    nix: add pure server to flake, add both pure and impure devshells (#2430) Daniël de Kok 2024-08-20 22:07:33 +0200
  • c988d6e620 nix: add ipdb to impure devshell Daniël de Kok 2024-08-20 19:59:35 +0000
  • b3a4d10a42 nix: remove unused poetry2nix input Daniël de Kok 2024-08-18 13:10:29 +0000
  • f015bb382c nix: pure server and support both pure and impure devShells Daniël de Kok 2024-08-18 12:36:29 +0000
  • b70ae0969f
    Prefix caching (#2402) Nicolas Patry 2024-08-20 11:15:30 +0200
  • 93194a5075 feat: add /v1/models endpoint drbh 2024-08-19 16:00:48 +0000
  • 38773453ae
    nix: update to CUDA 12.4 (#2429) Daniël de Kok 2024-08-19 09:28:38 +0200
  • 5b645e7e1b poetry2nix: follow tgi-nix nixpkgs Daniël de Kok 2024-08-18 10:06:18 +0000
  • aa3a83eed2 Update to CUDA 12.4 Daniël de Kok 2024-08-18 10:04:12 +0000
  • caf9fcc600
    Update flake.lock Nicolas Patry 2024-08-14 18:28:18 +0200
  • 95155a212b
    Fixup: Nicolas Patry 2024-08-14 18:26:29 +0200
  • 97c504136c
    Remove router.nix Nicolas Patry 2024-08-14 12:03:47 +0200
  • 4a38185d78
    Removing the logs. Nicolas Patry 2024-08-14 11:29:40 +0200
  • 4fff77ebcb
    Medusa requires reshaping. Nicolas Patry 2024-08-13 16:25:29 +0200
  • 99b6b5c795
    Fixing prefix caching. Nicolas Patry 2024-08-13 16:22:52 +0200
  • b2933b72d0
    Fixing medusa without prefix caching. Nicolas Patry 2024-08-13 13:13:08 +0200
  • 4c8dcbb76d
    Just medusa values now. Nicolas Patry 2024-08-13 13:02:48 +0200
  • 549f0e9ca7
    Fixing medusa (still wrong outputs, but functional). Nicolas Patry 2024-08-13 11:58:55 +0200
  • b31ec3bc8c
    Fixing black. Nicolas Patry 2024-08-12 17:27:45 +0200
  • 89b42c9b48
    Fixing flashinfer import. Nicolas Patry 2024-08-12 16:31:41 +0200
  • 0c90550e9d
    Fixing prefix attention. Nicolas Patry 2024-08-12 16:23:18 +0200
  • 44a77dcb9e
    Prefix caching WIP Daniël de Kok 2024-08-09 11:47:14 +0000
  • e4201f44cf
    All integration tests back everywhere (too many failed CI). (#2428) Nicolas Patry 2024-08-16 21:19:46 +0200
  • d7bee1c026
    Punica uses raw ASM which is not valid on 9.0 apparently. Nicolas Patry 2024-08-16 19:58:53 +0200
  • 53729b74ac
    doc: Add metrics documentation and add a 'Reference' section (#2230) Hugo Larcher 2024-08-16 19:43:30 +0200
  • f18262b16f
    Common arch list. Nicolas Patry 2024-08-16 19:02:22 +0200
  • 9190b0c82c
    Attempt to remove the specifed compute cap. Nicolas Patry 2024-08-16 16:21:17 +0200
  • 42bc3398a9
    Upgrade integration tests after 12.4 Nicolas Patry 2024-08-16 15:19:39 +0200
  • e3f562999a
    All integration tests back everywhere (too many failed CI). Nicolas Patry 2024-08-16 14:27:19 +0200
  • cb0a29484d
    FIxing the CI. Nicolas Patry 2024-08-16 14:21:29 +0200
  • c7ab1810d4
    Further fixes. (#2426) Nicolas Patry 2024-08-16 13:21:44 +0200
  • 986071197b
    Fix the condition. Nicolas Patry 2024-08-16 13:20:53 +0200
  • 188ceb2fef
    Update the conftest to allow NaN (first logprob). Nicolas Patry 2024-08-16 13:16:51 +0200
  • 772f0d1c80
    Further fixes. Nicolas Patry 2024-08-16 13:09:18 +0200
  • 99b662f8c2
    Improve the Consuming TGI + Streaming docs. (#2412) Vaibhav Srivastav 2024-08-16 12:43:08 +0200
  • ec4ea88109 Last nit Vaibhav Srivastav 2024-08-16 12:23:48 +0200
  • 1411bfb989
    nix: try to reduce the number of Rust rebuilds (#2424) Daniël de Kok 2024-08-16 10:01:01 +0200
  • 8521036efd nix: try to reduce the number of Rust rebuilds Daniël de Kok 2024-08-15 15:53:21 +0000
  • 1b0aa06204
    Upgrading the tests to match the current workings. (#2423) Nicolas Patry 2024-08-15 13:28:42 +0200
  • e32876fa97
    Upgrading the tests to match the current workings. Nicolas Patry 2024-08-15 13:27:51 +0200
  • 369e499a66
    Simplify the warmup process (#173) yuanwu2017 2024-08-15 18:04:14 +0800
  • 57b3495823
    Fixing exl2 and other quanize tests again. (#2419) Nicolas Patry 2024-08-15 11:12:51 +0200
  • 18ad84c8fa
    Adding warnings for deprecated bitsandbytes + upgrade info to warn. Nicolas Patry 2024-08-15 11:04:30 +0200
  • e46df82e4f
    Go back to released exl2 and remove log. Nicolas Patry 2024-08-15 09:28:50 +0200
  • 3643d1cd9e
    Removing serde override. Nicolas Patry 2024-08-15 09:17:06 +0200
  • 13350a330f
    Fix quantization defaults without cuda graphs on exl2 (linked to new issues with it). Nicolas Patry 2024-08-15 09:12:21 +0200
  • a041603462
    Fixing exl2 (by disabling cuda graphs) Nicolas Patry 2024-08-14 19:41:29 +0200
  • 06ee185cf8
    Mark exl2 as non release (so CI tests them, needs to be removed latet). Nicolas Patry 2024-08-14 16:51:46 +0200
  • f4ce670eb0
    Fixing exl2 and other quanize tests again. Nicolas Patry 2024-08-14 16:30:46 +0200
  • 9aaa12e7ac
    nix: build router incrementally (#2422) Daniël de Kok 2024-08-15 10:21:51 +0200
  • fda39e71d2 nix: build router incrementally Daniël de Kok 2024-08-15 08:14:07 +0000
  • c03a7b3560 update doc with intel cpu part Wang, Yi A 2024-08-14 17:49:05 -0700
  • b378fb4702
    Fixing exl2 (by disabling cuda graphs) fix_exl2 Nicolas Patry 2024-08-14 19:41:29 +0200
  • 89707adbbb
    Fixing exl2 (by disabling cuda graphs) exl2 Nicolas Patry 2024-08-14 19:41:29 +0200
  • 4b10c8c30b fix: improve scales change and revert conditional fix-repack-for-marlin drbh 2024-08-14 16:38:15 +0000
  • 0e09eeacfc Doc review from Nico. x2 Vaibhav Srivastav 2024-08-14 13:11:25 +0200
  • 3f385991b0
    More fixes trtllm (#2342) Funtowicz Morgan 2024-08-14 12:02:05 +0200
  • f3b5c69441
    Upgrading exl2. (#2415) Nicolas Patry 2024-08-14 11:58:08 +0200
  • bb2b93e7a3 Doc review from Nico. Vaibhav Srivastav 2024-08-14 11:35:50 +0200
  • 00edb1a789 Up. Vaibhav Srivastav 2024-08-14 11:25:15 +0200
  • a590b2f548 Merge branch 'vb/update-consuming-tgi-docs' of https://github.com/Vaibhavs10/text-generation-inference into vb/update-consuming-tgi-docs Vaibhav Srivastav 2024-08-14 11:19:03 +0200
  • a27b31c34a Up. Vaibhav Srivastav 2024-08-14 11:18:45 +0200
  • b2d3948ccf
    Fix idefics. Nicolas Patry 2024-08-14 11:14:11 +0200
  • 5c598cc7ed
    Fixing the other pathways. Nicolas Patry 2024-08-14 08:53:17 +0200
  • c9047667ad
    Upgrading exl2. Nicolas Patry 2024-08-14 08:49:58 +0200
  • 921448cfdf
    Apply suggestions from code review Vaibhav Srivastav 2024-08-14 11:11:47 +0200
  • 9c83e04f40 limit nb input tokens Xuan Son Nguyen 2024-08-14 11:07:39 +0200
  • c5fff92b48
    nix: partial incremental build of the router (#2416) Daniël de Kok 2024-08-14 11:06:28 +0200
  • 30912d3f6d nix: partial incremental build of the router Daniël de Kok 2024-08-14 08:55:02 +0000
  • a6506a51b6 update Xuan Son Nguyen 2024-08-14 10:36:17 +0200
  • ab4d480d91 fix: repack for marlin when single scale is provided drbh 2024-08-13 16:52:15 -0400
  • 7007394766 Up. Vaibhav Srivastav 2024-08-13 19:58:21 +0200
  • 4abd7d3971
    Update docs/source/basic_tutorials/consuming_tgi.md Vaibhav Srivastav 2024-08-13 19:55:11 +0200
  • 3ba36590c6
    Apply suggestions from code review Vaibhav Srivastav 2024-08-13 19:41:39 +0200
  • cd18ee3ac9 Up. Vaibhav Srivastav 2024-08-13 19:07:38 +0200
  • d59df84169 Update Gradio snippet. Vaibhav Srivastav 2024-08-13 19:05:47 +0200
  • 6e00e05cec Merge branch 'vb/update-consuming-tgi-docs' of https://github.com/Vaibhavs10/text-generation-inference into vb/update-consuming-tgi-docs Vaibhav Srivastav 2024-08-13 18:43:27 +0200
  • 552ee136d8 Suggestions from Lucain. Vaibhav Srivastav 2024-08-13 18:41:38 +0200
  • 364906f427
    Apply suggestions from code review Vaibhav Srivastav 2024-08-13 17:26:38 +0200
  • 98d66f0534 More updates. Vaibhav Srivastav 2024-08-13 16:57:17 +0200
  • 1cebccc72b
    fix: adds causal to attention params (#2408) drbh 2024-08-13 10:19:46 -0400
  • 1d37a6a06a add info about Open AI client. Vaibhav Srivastav 2024-08-13 16:17:32 +0200
  • 5512446726 Fix erronous update to . Vaibhav Srivastav 2024-08-13 15:55:11 +0200
  • 8de10acdcf Improve the Consuming TGI docs. Vaibhav Srivastav 2024-08-13 15:47:25 +0200
  • 59922f9bc1
    add numa to improve cpu inference perf (#2330) Wang, Yi 2024-08-13 21:33:55 +0800
  • cd9b15d17f
    Adding more kernels to flake. (#2411) Nicolas Patry 2024-08-13 10:49:18 +0200
  • 6e7a33c812
    Adding more kernels to flake. Nicolas Patry 2024-08-13 10:46:00 +0200
  • 6f4bb4f26f
    nix: incremental build of the launcher (#2410) Daniël de Kok 2024-08-13 10:44:15 +0200
  • 266dd54bfd nix: incremental build of the launcher Daniël de Kok 2024-08-13 06:18:22 +0000
  • 519e5ac05b fix: adds causal to attention params to check when using flash attn v1 drbh 2024-08-13 00:56:15 +0000
  • e3f0f85b70
    Pad token handling for Llama3.1 (#199) Sun Choi 2024-08-12 15:00:41 -0700
  • 8a7749b8fb
    fix: include create_exllama_buffers and set_device for exllama (#2407) drbh 2024-08-12 17:59:37 -0400
  • c09f5bc930
    Merge pull request #187 from yuanwu2017/v2.0.4 regisss 2024-08-12 23:59:03 +0200
  • d8a60f70bc fix: include create_exllama_buffers and set_device for exllama drbh 2024-08-12 20:48:21 +0000
  • 9a7830bd28
    Pr 2395 ci run (#2406) drbh 2024-08-12 14:38:59 -0400
  • d56ccb5220 feat: add message and chat template test drbh 2024-08-12 17:04:52 +0000
  • 05e9a83153 Merge commit 'refs/pull/2395/head' of github.com:huggingface/text-generation-inference into pr-2395-ci-run drbh 2024-08-12 17:01:11 +0000
  • fb481787a7 feat: add message and chat template test drbh 2024-08-12 16:57:17 +0000
  • d529fd7b0b Merge commit 'refs/pull/2395/head' of github.com:huggingface/text-generation-inference into pr-2395-ci-run drbh 2024-08-12 16:38:59 +0000