Commit Graph

  • 1c917c0349 fix: run lint and update docs drbh 2024-09-02 19:29:27 +0000
  • 1fb9d406e7 feat: return reasonable generation and add integration test drbh 2024-09-02 19:20:49 +0000
  • dff1b9f795 fix: prefer llama base model and improve rotary logic drbh 2024-09-02 16:19:45 +0000
  • 853bc514f2 feat: support phi3.5 moe model loading drbh 2024-08-30 15:14:59 +0000
  • 5a08c33696 Add support for GPTQ-quantized MoE models using MoE Marlin Daniël de Kok 2024-09-24 11:24:09 +0000
  • e790cfc0e4
    Update architecture.md (#2577) Ikram Ul Haq 2024-09-30 09:56:20 +0300
  • afc7ded84f
    Remove compute capability lazy cell (#2580) Daniël de Kok 2024-09-30 08:48:47 +0200
  • 2401fdc889 cleaned dockerfile Mohit Sharma 2024-09-30 03:40:00 +0000
  • 513ba5a0b4 feat(tgi_common) continue more utility functions cuda_ipc_allreduce Morgan Funtowicz 2024-09-29 12:33:31 +0000
  • 0b80a928eb Don't spam journalctl on Linux Daniël de Kok 2024-09-29 12:43:55 +0200
  • b806a856c7 Stream to make the builder image smaller Daniël de Kok 2024-09-29 12:35:56 +0200
  • fb81ffce02 feat(common) adding device utilities EC2 Default User 2024-09-28 21:10:18 +0000
  • 755361b932 Example of building the Docker image using Nix inside Docker Daniël de Kok 2024-09-28 21:10:33 +0200
  • 1028996fb3
    flashinfer: pass window size and dtype (#2574) Daniël de Kok 2024-09-28 18:41:41 +0200
  • 3b28cf9067 improve dockerfile Mohit Sharma 2024-09-28 15:54:45 +0000
  • 6d9515578f break when there's nothing to read Wang, Yi A 2024-09-27 18:35:39 -0700
  • 77a36d45eb nix: experimental support for building a Docker image Daniël de Kok 2024-08-29 08:16:37 +0000
  • 5f9120da9c feat(tgi_common): add initial set of common functions for reuse Morgan Funtowicz 2024-09-27 18:53:56 +0200
  • 7cb49f6f4f float16 dep Mohit Sharma 2024-09-27 15:53:44 +0000
  • b2cd1b66ed fix imports after rebase Mohit Sharma 2024-09-27 15:52:43 +0000
  • 473d9a892d Merge remote-tracking branch 'upstream/main' into rocm_6.2_updates Mohit Sharma 2024-09-27 15:36:12 +0000
  • cdda9dcb28 Remove compute capability lock Daniël de Kok 2024-09-27 14:22:46 +0000
  • 5b6b74e21d
    Improve support for GPUs with capability < 8 (#2575) Daniël de Kok 2024-09-27 16:19:42 +0200
  • 346dfe398a remove import Mohit Sharma 2024-09-27 12:59:35 +0000
  • a24c2cc5e9 updated default value Mohit Sharma 2024-09-27 12:39:12 +0000
  • ac2dccd174 improved error messag Mohit Sharma 2024-09-27 12:34:04 +0000
  • 816d4b67b2 fix import Mohit Sharma 2024-09-27 12:32:17 +0000
  • 3eb68a371d Capability as usizes Daniël de Kok 2024-09-27 11:42:17 +0000
  • a29636ee0a Move disabling prefix caching into the block of exceptions Daniël de Kok 2024-09-27 11:29:36 +0000
  • 47c81d2924 Merge remote-tracking branch 'upstream/main' into fix_rocm_fa Mohit Sharma 2024-09-27 10:34:16 +0000
  • 829144d15a addressed review comments Mohit Sharma 2024-09-27 10:28:37 +0000
  • 31a6065fac Add some utility functions in tgiccl for now Morgan Funtowicz 2024-09-26 23:31:07 +0200
  • 105f384461 Merge branch 'cuda_ipc_allreduce' of github.com:huggingface/text-generation-inference into cuda_ipc_allreduce Morgan Funtowicz 2024-09-26 23:52:40 +0200
  • 45e060e857 feat: propagate max_concurrent_requests to queue state entries instead of hardcoded 128 in backends/v3 Venkat Raman 2024-09-26 19:51:10 +0200
  • 77ddc8309d feat: propagate max_concurrent_requests to queue state entries instead of hardcoded 128 in backend/v2 Venkat Raman 2024-09-26 18:02:42 +0200
  • 36f418bd04
    Update architecture.md Ikram Ul Haq 2024-09-26 18:40:41 +0300
  • 8c0f9312f3 nix: add flash-attn-v1 to the server environment Daniël de Kok 2024-09-26 13:34:01 +0000
  • bee5ee1f03 Improve support for GPUs with capability < 8 Daniël de Kok 2024-09-26 13:35:27 +0000
  • cdca095012 flashinfer: pass window size and dtype Daniël de Kok 2024-09-26 12:59:51 +0000
  • 0aa66d693a
    Fix build with --features google (#2566) Alvaro Bartolome 2024-09-26 11:41:38 +0200
  • ecbf34b280
    Add cargo test --features google Alvaro Bartolome 2024-09-26 11:19:32 +0200
  • 0b7df77178
    Add LoRA adapters support for Gemma2 (#2567) Alvaro Bartolome 2024-09-26 10:54:08 +0200
  • bab529c916 Make Gaudi adapt to the tgi 2.3.0 yuanwu 2024-09-26 01:53:52 +0000
  • de38bf2664
    Updating Cargo lock mllama Nicolas Patry 2024-09-25 20:51:29 +0200
  • 28f369cc99
    Ugrade transformers 4.45 Nicolas Patry 2024-09-25 20:46:44 +0200
  • 31a4c24f74
    Mllama Nicolas Patry 2024-09-25 20:41:40 +0200
  • 44cdb00bbb
    Updating config, removing TODO Nicolas Patry 2024-09-25 10:48:03 +0200
  • 047e2e8163
    Fix idefics. Nicolas Patry 2024-09-23 16:13:22 +0200
  • f1f9079ec6
    Cleaner condition. Nicolas Patry 2024-09-23 15:50:43 +0200
  • 8abdd08ef4
    Working state ? (Broke idefics1 temporarily). Nicolas Patry 2024-09-23 15:25:26 +0200
  • 39d2073e93
    Preprocessing. Nicolas Patry 2024-09-18 17:59:13 +0200
  • 907906466a
    Working loading state. Nicolas Patry 2024-09-18 17:01:36 +0200
  • fb28d374e1
    Make black formatting happy Alvaro Bartolome 2024-09-25 19:10:10 +0200
  • e424752fa3
    Enable the AutoGPTQ (#217) yuanwu2017 2024-09-26 00:55:02 +0800
  • 3b7e010a4c
    Add LoRA adapters support for Gemma2 Alvaro Bartolome 2024-09-25 18:12:12 +0200
  • b49d88a970
    Fix cargo build --features google Alvaro Bartolome 2024-09-25 17:15:23 +0200
  • f1c6eacd75 feat(tgiccl): initial commit for custom tgiccl backend Morgan Funtowicz 2024-09-25 12:27:28 +0000
  • 14fdc4ae5e Add some missing modification of 2.3.0 because of conflict yuanwu 2024-09-25 07:49:49 +0000
  • 514a5a737d Preparing for release. (#2540) Nicolas Patry 2024-09-20 17:42:04 +0200
  • bd9675c8c7 fix: wrap python basic logs in debug assertion in launcher (#2539) OlivierDehaene 2024-09-20 16:59:31 +0200
  • 3519398a14 hotfix: ipex fails since cuda moe kernel is not supported (#2532) Wang, Yi 2024-09-20 16:02:55 +0800
  • b6ef2bfc1b doc: clarify that --quantize is not needed for pre-quantized models (#2536) Daniël de Kok 2024-09-19 22:17:15 +0200
  • c1a99e2f15 Update to moe-kenels 0.3.1 (#2535) Daniël de Kok 2024-09-19 22:16:32 +0200
  • 2d470c8282 Stream options. (#2533) Nicolas Patry 2024-09-19 20:50:37 +0200
  • 29a93b78ba Move to moe-kernels package and switch to common MoE layer (#2511) Daniël de Kok 2024-09-17 18:08:58 +0200
  • 88b72c8eb3 fix: metrics unbounded memory (#2528) OlivierDehaene 2024-09-17 18:01:28 +0200
  • 0ecbd61099 nix: pure Rust check/fmt/clippy/test (#2525) Daniël de Kok 2024-09-17 12:14:30 +0200
  • 0110b83aff Adding a test for FD. (#2516) Nicolas Patry 2024-09-16 17:00:54 +0200
  • e8c329372b Add tests for Mixtral (#2520) Daniël de Kok 2024-09-16 12:39:18 +0200
  • afe5cae8fc Use ratatui not (deprecated) tui (#2521) Alex Strick van Linschoten 2024-09-13 18:45:28 +0200
  • cbfe9e5185 hotfix : enable intel ipex cpu and xpu in python3.11 (#2517) Wang, Yi 2024-09-12 23:23:49 +0800
  • 5fc0e0c589 fix: pass missing revision arg for lora adapter when loading multiple… (#2510) drbh 2024-09-12 17:04:52 +0200
  • 7d897188d5 Add nix test. (#2513) Nicolas Patry 2024-09-12 14:54:56 +0200
  • 7be7ab7015 nix: support Python tokenizer conversion in the router (#2515) Daniël de Kok 2024-09-12 10:44:01 +0200
  • f32fa568b6 Fix truffle (#2514) Nicolas Patry 2024-09-11 22:45:19 +0200
  • c6b568b892 Fix tokenization yi (#2507) Nicolas Patry 2024-09-11 22:41:56 +0200
  • 510d1c76c8 Prefix test - Different kind of load test to trigger prefix test bugs. (#2490) Nicolas Patry 2024-09-11 18:10:40 +0200
  • b67a0cd37b Add Directory Check to Prevent Redundant Cloning in Build Process (#2486) Vallepu Vamsi Krishna 2024-09-07 16:49:43 +0530
  • eb54d956ef Fixing more correctly the invalid drop of the batch. (#2498) Nicolas Patry 2024-09-06 17:35:49 +0200
  • 7c2ed55b2e Add links to Adyen blogpost (#2500) Martin Iglesias Goyanes 2024-09-06 17:00:54 +0200
  • 0198db125e hotfix: add syrupy to the right subproject (#2499) Daniël de Kok 2024-09-06 12:47:06 +0200
  • 67f44cce0d radix trie: add assertions (#2491) Daniël de Kok 2024-09-06 11:55:23 +0200
  • 8ba790a14e Fix incompatibility with latest syrupy and update in Poetry (#2497) Daniël de Kok 2024-09-06 11:00:52 +0200
  • 1e14a94721 nix: add pyright/ruff for proper LSP in the impure devshell (#2496) Daniël de Kok 2024-09-06 10:19:04 +0200
  • 938a7f3c3a hotfix: fix regression of attention api change in intel platform (#2439) Wang, Yi 2024-09-05 23:41:39 +0800
  • d8610a6219 Add two handy gitignores for Nix environments (#2484) Daniël de Kok 2024-09-05 17:06:54 +0200
  • 556a87030b Adding links to Adyen blogpost. (#2492) Nicolas Patry 2024-09-05 16:11:52 +0200
  • c7b495f97d hotfix: avoid non-prefilled block use when using prefix caching (#2489) Daniël de Kok 2024-09-05 15:09:29 +0200
  • 34a6399a50 feat: support lora revisions and qkv_proj weights (#2482) drbh 2024-09-02 13:09:06 -0400
  • be5cb0cf7f fix: enable chat requests in vertex endpoint (#2481) drbh 2024-09-02 10:00:52 -0400
  • 3e17cb7866 nix: add punica-kernels (#2477) Daniël de Kok 2024-09-02 11:31:36 +0200
  • 07c70e7840 nix: improve impure devshell (#2478) Daniël de Kok 2024-09-02 09:27:10 +0200
  • a313355d2b Tied embeddings in MLP speculator. (#2473) Nicolas Patry 2024-08-29 17:44:54 +0200
  • 61b2f493a8 update doc with intel cpu part (#2420) Wang, Yi 2024-08-29 23:42:02 +0800
  • 990478b285 feat: add /v1/models endpoint (#2433) drbh 2024-08-29 10:32:38 -0400
  • 4e1ca8d7bd Lots of improvements (Still 2 allocators) (#2449) Nicolas Patry 2024-08-29 16:29:01 +0200
  • 622c9c367a nix: build Torch against MKL and various other improvements (#2469) Daniël de Kok 2024-08-29 16:25:25 +0200
  • 08834e0cfd fix: improve regex expression (#2468) drbh 2024-08-28 13:44:44 -0400
  • e80b2c21dc fix: bump minijinja version and add test for llama 3.1 tools (#2463) drbh 2024-08-27 13:31:08 -0400
  • 6793b720ba Fixing CI. (#2462) Nicolas Patry 2024-08-27 15:33:02 +0200