Commit Graph

  • d912f0bf55
    Update documentation to most recent stable version of TGI. (#2625) vb 2024-10-10 19:30:25 +0530
  • e36dfaa8de
    feat: allow tool calling to respond without a tool (#2614) drbh 2024-10-10 09:28:25 -0400
  • f30e2c7320 Make moe-kernels and marlin-kernels mandatory in CUDA installs Daniël de Kok 2024-10-10 08:28:55 +0000
  • 307c9ea371 fix: adjust test expected content David Holtz 2024-10-09 23:21:43 +0000
  • 99b1cf5948 fix: rerun linter add-test-for-warmup-and-kvcache David Holtz 2024-10-09 20:10:31 +0000
  • c396c54231 fix: adjust test to only run on cuda David Holtz 2024-10-09 20:02:29 +0000
  • 541c476492 fix: cleanup ruff lints David Holtz 2024-10-09 19:27:23 +0000
  • be7aa9c583
    Update quicktour.md Omar Sanseviero 2024-10-09 21:05:45 +0200
  • 247033b45f fix: address ruff unsued vars lint David Holtz 2024-10-09 18:58:35 +0000
  • 130f9d16b5 fix: rerun black lint add-rotary-embed-tests David Holtz 2024-10-09 18:44:41 +0000
  • 1ddde382bd fix: only run test when cuda is available David Holtz 2024-10-09 18:43:53 +0000
  • 93028b113a fix: adjust chat input test for no_tool David Holtz 2024-10-09 18:38:46 +0000
  • 301a18c2e5 fix: limit some tests to only run when cuda available David Holtz 2024-10-09 18:35:29 +0000
  • b48eca405a fix: prefer no_tool over notify_error to improve reponse David Holtz 2024-10-09 18:13:34 +0000
  • fa140a2eeb fix: always send event on error, avoid unwraps, refactor and improve tests David Holtz 2024-10-09 17:20:59 +0000
  • 2db8f6004a
    Apparently container can be gone already. Nicolas Patry 2024-10-09 18:11:53 +0200
  • 338aadd067
    Snapshot rename Nicolas Patry 2024-10-09 16:52:20 +0200
  • 1e03ea96d0
    Let's try non sharded gemma. Nicolas Patry 2024-10-09 16:37:10 +0200
  • 15e178e3ad
    Intel CI ? Nicolas Patry 2024-10-09 16:08:03 +0200
  • 43f39f6894
    AMD CI (#2589) Nicolas Patry 2024-10-09 17:50:49 +0200
  • a8108bc0da feat: add basic test for the warmup step and memory allocation of the kv cache David Holtz 2024-10-09 13:49:11 +0000
  • d2361d7fd8
    Temp deactivate intel, activate nvidia ? Nicolas Patry 2024-10-09 12:42:34 +0200
  • 822131f6dc
    Fix nvidia ? Nicolas Patry 2024-10-09 12:20:03 +0200
  • 46ccffd246
    Flash llama on intel CPU ? Nicolas Patry 2024-10-09 12:08:30 +0200
  • 8321a6c8e5
    Try no devices ? Nicolas Patry 2024-10-09 11:57:41 +0200
  • b18ed0f443
    ? Nicolas Patry 2024-10-09 11:42:38 +0200
  • 9ed0c85fe1
    nix: add black and isort to the closure (#2619) Daniël de Kok 2024-10-09 11:08:02 +0200
  • 415d29fa2a nix: add black and isort to the closure Daniël de Kok 2024-10-08 07:39:11 +0000
  • 8d324dbb48 Update to most recent stable version of TGI. Vaibhav Srivastav 2024-10-09 11:13:22 +0530
  • 040c5b5970 feat: add test for positional rotary embeddings System administrator 2024-10-08 19:10:38 +0000
  • 8ad20daf33
    CI (2599): Update ToolType input schema (#2601) drbh 2024-10-08 12:35:48 -0400
  • 7d2aa27161 fix: return event in all cases System administrator 2024-10-08 14:34:28 +0000
  • 17c66892b5 fix: improve comparison via ruff lint David Holtz 2024-10-07 13:41:51 +0000
  • 2abcc8ea0b fix: expect content in test David Holtz 2024-10-07 13:15:47 +0000
  • 6def99d61b feat: process token stream before returning to client David Holtz 2024-10-07 01:58:53 +0000
  • 735bcf6745 add XPU env to optimize perf of TP Wang, Yi A 2024-10-08 09:57:36 -0400
  • c2a8819edf Merge branch 'main' into sliding_window Wang, Yi A 2024-10-08 10:00:48 -0400
  • 78ca1414b7 set kv cache dtype Wang, Yi A 2024-10-08 08:00:06 -0400
  • 92fa7ac7e9 Merge branch 'main' into gpt_awq_4 Wang, Yi A 2024-10-08 07:55:38 -0400
  • e618ce3ada Fix: make moe_kernels imports conditional bugfix/moe-kernels-imports Daniël de Kok 2024-10-08 11:05:28 +0000
  • 6db3bcb700
    nix: move back to the tgi-nix main branch (#2620) Daniël de Kok 2024-10-08 12:55:05 +0200
  • 3f59a3d610 Test Marlin MoE with desc_act=true Daniël de Kok 2024-10-08 10:51:18 +0000
  • 3e8d722733
    Fixing ? Nicolas Patry 2024-10-08 12:04:09 +0200
  • 595f5d5b3b nix: move back to the tgi-nix main branch Daniël de Kok 2024-10-08 09:58:14 +0000
  • 64142489b6
    Add support for fused MoE Marlin for AWQ (#2616) Daniël de Kok 2024-10-08 11:56:41 +0200
  • 8ab0d60cf8
    Fix docker volume Nicolas Patry 2024-10-08 11:29:44 +0200
  • a1c5f38c87 Add integration test for AWQ MoE Daniël de Kok 2024-10-08 08:00:23 +0000
  • 2839d04c71
    Docker volume split. Nicolas Patry 2024-10-08 09:48:34 +0200
  • 8b295aa498
    Upgrade minor rust version (Fixes rust build compilation cache) (#2617) Nicolas Patry 2024-10-08 09:42:50 +0200
  • 53f9b18086
    Black Nicolas Patry 2024-10-08 09:28:07 +0200
  • df962ca864 Add support for fused MoE Marlin for AWQ Daniël de Kok 2024-10-07 15:57:03 +0000
  • a75c9a21e8
    Upgrade minor rust version (Fixes rust build compilation cache) Nicolas Patry 2024-10-08 09:20:01 +0200
  • 57f9685dc3
    enable mllama in intel platform (#2610) Wang, Yi 2024-10-08 03:15:09 +0800
  • 0da4df4b96
    Fix FP8 KV-cache condition (#2611) Florian Zimmermeister 2024-10-07 09:34:19 +0200
  • 1d384f8c98
    Merge branch 'main' into sliding_window Wang, Yi 2024-10-05 23:04:25 +0800
  • 74489227e0
    Add Google Cloud in docs/source/references/api_reference.md add-google-cloud-provider Alvaro Bartolome 2024-10-05 16:54:17 +0200
  • 8ec1b998a4
    Trailing slash ? Nicolas Patry 2024-10-05 15:26:04 +0200
  • 47c01cb048
    Fix AWS Sagemaker indentation, typo and header level Alvaro Bartolome 2024-10-05 13:26:25 +0200
  • e03a7167ee
    Update kv_cache.py Florian Zimmermeister 2024-10-05 09:12:12 +0200
  • 11d7af730b add cloning in Dockerfile add_tunable_prefill Mohit Sharma 2024-10-04 17:41:02 +0000
  • 862651a90d ensure lfs files are downloaded Mohit Sharma 2024-10-04 17:21:01 +0000
  • fc00efb2e7 Update tuned file with bf16 tuned ops Mohit Sharma 2024-10-04 17:12:31 +0000
  • 066d3b1fe8 Update tuned file Mohit Sharma 2024-10-04 17:08:28 +0000
  • 2358c2bb54
    Add basic FP8 KV cache support (#2603) Daniël de Kok 2024-10-04 17:51:48 +0200
  • 68103079f4
    nix: example of local package overrides during development (#2607) Daniël de Kok 2024-10-04 16:52:42 +0200
  • 4e3c272e44
    Common cache ? Nicolas Patry 2024-10-04 16:29:20 +0200
  • ed4f9915d0 enable mllama in intel platform Wang, Yi A 2024-10-04 06:32:50 -0700
  • ed5c2fb127 Fix Cargo.toml Daniël de Kok 2024-10-04 13:25:08 +0000
  • 78d6c27d89 Add basic FP8 KV cache support Daniël de Kok 2024-10-04 13:20:16 +0000
  • 1f5eb995ff nix: example of local package overrides during development Daniël de Kok 2024-10-04 08:10:52 +0000
  • 3011639ff7
    Revert "Unroll notify error into generate response" (#2605) drbh 2024-10-03 17:56:40 -0400
  • 3f07ddb469 feat: support llama 3.1 tooling and remove grammar schema enable-non-grammar-constrained-tools David Holtz 2024-10-03 20:48:49 +0000
  • e03f9b9312
    Revert "Unroll notify error into generate response (#2597)" drbh 2024-10-03 10:25:28 -0400
  • a094729386
    V2.3.1 v2.3.1 git_v2.3.1 Nicolas Patry 2024-10-03 14:49:40 +0200
  • f6e2f05b16
    New release 2.3.1 (#2604) Nicolas Patry 2024-10-03 14:43:49 +0200
  • 2784671e7d
    Update doc number Nicolas Patry 2024-10-03 14:28:21 +0200
  • 19427ca10b fix: allow tool choide to be null David Holtz 2024-10-03 12:25:31 +0000
  • 1870050cfa
    New release 2.3.1 Nicolas Patry 2024-10-03 14:20:03 +0200
  • d78d9c0baa
    yaml Nicolas Patry 2024-10-03 14:06:18 +0200
  • ff8e8852be
    Testing Nicolas Patry 2024-10-03 14:02:07 +0200
  • 78776cdd25 add tuned config Mohit Sharma 2024-10-03 11:59:14 +0000
  • 06f9186d80
    Fallback to single shard CI test. Nicolas Patry 2024-10-03 12:53:20 +0200
  • 6ae04672b6
    fix: Add variable results file location Hugo Larcher 2024-10-03 11:34:38 +0200
  • 7081b8fb4d
    On 2 GPUs Nicolas Patry 2024-10-02 18:30:36 +0200
  • ccc4fa24f8
    Wrong kwarg. Nicolas Patry 2024-10-02 18:26:24 +0200
  • e8f1c16001
    Repeated arguments Nicolas Patry 2024-10-02 18:24:57 +0200
  • 9866798dfd
    No token Nicolas Patry 2024-10-02 18:23:17 +0200
  • 51f07200ce
    Without pytest. Nicolas Patry 2024-10-02 18:21:05 +0200
  • c803a53cdd
    Mount volume + port forward Nicolas Patry 2024-10-02 18:16:31 +0200
  • 18fefd4d68
    Giving HIP devices. Nicolas Patry 2024-10-02 18:14:22 +0200
  • 3b8cf5f4ef
    Start downloading with token Nicolas Patry 2024-10-02 18:11:44 +0200
  • ff5da177e1
    Show pip freeze. Nicolas Patry 2024-10-02 18:10:09 +0200
  • f68ed01587
    Sending args. Nicolas Patry 2024-10-02 18:09:41 +0200
  • 0936c79995
    Raw tgi Nicolas Patry 2024-10-02 18:07:13 +0200
  • 29eb7601a9
    Only 1 test, no container name Nicolas Patry 2024-10-02 18:03:26 +0200
  • 3ae4e090eb
    Remove stdin messing Nicolas Patry 2024-10-02 18:00:27 +0200
  • 7bead9ef0f
    Attemp #1 Nicolas Patry 2024-10-02 17:57:23 +0200
  • cbc7d09da6
    Start with devices Nicolas Patry 2024-10-02 17:49:05 +0200
  • ae92fd0f38
    from env Nicolas Patry 2024-10-02 17:46:08 +0200
  • 2638bc0a2a
    No pytest. Nicolas Patry 2024-10-02 17:44:08 +0200