Commit Graph

  • fef7c02bde
    Update README.md Erik Kaunismäki 2024-07-22 11:55:33 +0200
  • 20bcaea54f add some more information in CMakeLists.txt to correctly find and install nvrtc wrapper Morgan Funtowicz 2024-07-22 09:33:38 +0000
  • 4e4207224e
    Hotfix: fix of use of unquantized weights in Mixtral GQA loading (#2269) icyboy™ 2024-07-22 17:31:00 +0800
  • 84153702d2 add some more information in CMakeLists.txt to correctly install executorWorker Morgan Funtowicz 2024-07-22 08:43:10 +0000
  • dc70cf9de5 Hotfix: fix of use of unquantized weights in Mixtral GQA loading roc 2024-07-22 15:21:30 +0800
  • 6111e9ecd5
    Merge branch 'huggingface:main' into main icyboy™ 2024-07-22 15:18:57 +0800
  • 190368e137 fix llava_next regression in latest main Wang, Yi A 2024-07-21 22:30:33 -0700
  • c5904132c7 Merge branch 'main' into multi-modal Wang, Yi A 2024-07-21 22:29:48 -0700
  • f3435bab8c
    fix(server): fix deepseekv2 loading (#2266) OlivierDehaene 2024-07-21 16:48:04 +0000
  • 76dcc624cf
    fix(server): fix deepseekv2 loading OlivierDehaene 2024-07-21 18:47:00 +0200
  • 53ec0b790b
    feat(fp8): use fbgemm kernels and load fp8 weights directly (#2248) OlivierDehaene 2024-07-20 17:02:04 +0000
  • e5c1d6d611
    Add FP8 release test (#2261) Daniël de Kok 2024-07-20 12:26:06 +0200
  • 10bec164a9
    increase timeout OlivierDehaene 2024-07-20 10:09:03 +0200
  • c9e8b68426
    missing get_weights implementation OlivierDehaene 2024-07-20 09:56:46 +0200
  • b9410c3edf
    force new nccl on install OlivierDehaene 2024-07-20 09:47:03 +0200
  • 879ea45df7
    fix quantization config parsing OlivierDehaene 2024-07-20 09:30:21 +0200
  • 5789139c68
    fix auto conversion OlivierDehaene 2024-07-20 09:16:42 +0200
  • 6a93a24f3f
    refactored weights loader OlivierDehaene 2024-07-20 09:02:02 +0200
  • 081d16cab5
    add default dtype OlivierDehaene 2024-07-19 19:38:01 +0200
  • 10cd8ab4a6
    avoid circular import and fix dockerfile OlivierDehaene 2024-07-19 18:56:41 +0200
  • 985df12c46
    build fbgemm OlivierDehaene 2024-07-19 18:26:50 +0200
  • 80087783a5
    fix makefile OlivierDehaene 2024-07-18 16:34:43 +0200
  • a84373c918
    update outlines OlivierDehaene 2024-07-18 16:13:32 +0200
  • ee4174b6c7
    allow loading fp8 weights directly OlivierDehaene 2024-07-18 15:05:04 +0200
  • 27084bbfd3
    feat(fp8): add support for fbgemm OlivierDehaene 2024-07-18 12:40:57 +0200
  • 11123a8e99
    re-push to internal registry (#2242) Adrien 2024-07-20 07:06:40 +0200
  • d5464d2f80 add initial Dockerfile for TRTLLM backend Morgan Funtowicz 2024-07-19 22:08:12 +0000
  • 5845269759
    another one Adrien 2024-07-19 21:00:13 +0200
  • e39bac2cd1
    last reverts Adrien 2024-07-19 20:59:48 +0200
  • 79ae03c503
    revert tests Adrien 2024-07-19 20:58:51 +0200
  • e52be9bba2
    Add support for Deepseek V2 (#2224) Daniël de Kok 2024-07-19 17:23:20 +0200
  • 68a9685f1b
    fix: adjust default tool choice (#2244) drbh 2024-07-19 11:12:02 -0400
  • 39dbfbda43 Add FP8 release test Daniël de Kok 2024-07-19 14:36:48 +0000
  • 40f5dc3ed6
    add usage stats to toctree (#2260) Erik Kaunismäki 2024-07-19 16:34:04 +0200
  • 8b08a8c0e8 quick fix erikkaum 2024-07-19 16:25:55 +0200
  • 4c19593a90
    usage stats and crash reports (#2220) Erik Kaunismäki 2024-07-19 16:17:56 +0200
  • 3f37a66774
    Hotfix: pass through model revision in VlmCausalLM (#2258) Daniël de Kok 2024-07-19 15:59:00 +0200
  • eb9e109b9c
    Merge pull request #1 from huggingface/feature/no_repeat_ngram_size Nate Brake 2024-07-19 09:36:17 -0400
  • c3e65f575a Hotfix: pass through model revision in VlmCausalLM Daniël de Kok 2024-07-19 12:51:57 +0000
  • 3b41e93a09
    Hotfix: fix MPT after recent refactor (#2257) Daniël de Kok 2024-07-19 14:42:35 +0200
  • 18db78f295
    Hotfix: various GPT-based model fixes (#2256) Daniël de Kok 2024-07-19 14:42:19 +0200
  • f8a1546c09 Hotfix: various GPT-based model fixes Daniël de Kok 2024-07-19 10:25:18 +0000
  • 6300bab8b4 make sure executor_worker is provided Morgan Funtowicz 2024-07-19 11:56:43 +0000
  • 5757c163c5 Hotfix: fix MPT after recent refactor Daniël de Kok 2024-07-19 11:46:49 +0000
  • 80adb5be16
    Hotfix: fix of use of unquantized weights in Gemma GQA loading (#2255) Daniël de Kok 2024-07-19 12:55:59 +0200
  • dab813dec2 Hotfix: fix of use of unquantized weights in Gemma GQA loading Daniël de Kok 2024-07-19 10:07:07 +0000
  • 2579c927b0 cargo fmt erikkaum 2024-07-19 11:59:27 +0200
  • 0d03dc1d81 error reason can't be in nested json erikkaum 2024-07-19 11:35:23 +0200
  • db7c519ee2 Support passing head_dim through config Shaltiel Shmidman 2024-07-19 12:16:54 +0300
  • 836a2e2a2b Add support for Deepseek V2 Daniël de Kok 2024-07-12 12:57:08 +0200
  • 01d6e0ba32
    Update router/src/usage_stats.rs Erik Kaunismäki 2024-07-19 10:47:38 +0200
  • 2dfd111e63 should make docs check pass erikkaum 2024-07-19 10:46:29 +0200
  • 453130cfc3
    w Adrien 2024-07-19 10:06:03 +0200
  • ba291dad9f
    Improve the handling of quantized weights (#2250) Daniël de Kok 2024-07-19 09:37:39 +0200
  • 662fdeb6ee
    debug Adrien 2024-07-19 08:54:12 +0200
  • 97723d1458 add logging in case of decoding error Morgan Funtowicz 2024-07-18 22:19:25 +0000
  • 9ea7f9e950 remove logging Morgan Funtowicz 2024-07-18 22:08:46 +0000
  • e82dc30e8a expose information about potential error happening while decoding Morgan Funtowicz 2024-07-18 22:07:59 +0000
  • a19d318947 define a shared struct to hold the result of a decoding step Morgan Funtowicz 2024-07-18 21:33:04 +0000
  • a036574a86 add some more validation about grammar not supported Morgan Funtowicz 2024-07-18 20:57:23 +0000
  • b643a436f3 forward tgi parameters rep/freq penalty Morgan Funtowicz 2024-07-18 20:56:58 +0000
  • 82fc879e17 feat: refactor lora linear and remove adapter layers refactor-lora-linear drbh 2024-07-18 19:58:55 +0000
  • c728cb7015 feat: add ToolChoice to docs drbh 2024-07-18 18:04:58 +0000
  • 062f91ad60 fix: remove dev tests drbh 2024-07-18 17:48:51 +0000
  • 21dc6776b1 feat: improve tool choice syntax and response parsing/errors drbh 2024-07-18 17:41:17 +0000
  • fdd722e16e
    wip Adrien 2024-07-18 18:23:54 +0200
  • 7007b587ba
    wip Adrien 2024-07-18 18:22:41 +0200
  • 9c7abeba32
    ww Adrien 2024-07-18 18:16:13 +0200
  • d0c4a5f6ad
    wip Adrien 2024-07-18 18:12:29 +0200
  • bac5fda4bc
    wip Adrien 2024-07-18 18:05:27 +0200
  • cf16172a85 Exclude non-MLP layers when using FP8 quantization with Llama Daniël de Kok 2024-07-18 15:15:57 +0000
  • e29fc9e32a satisfy compiler erikkaum 2024-07-18 18:04:00 +0200
  • 1ed8856f61
    ww Adrien 2024-07-18 18:00:14 +0200
  • 95847c6587 expose the internal missing start/queue timestamp Morgan Funtowicz 2024-07-18 15:57:33 +0000
  • cb639c3247
    wip Adrien 2024-07-18 17:50:35 +0200
  • a1b69a8cc5
    Completing development guide development-guide Hugo Larcher 2024-07-18 17:38:18 +0200
  • a93b2b5083 Improve the handling of quantized weights Daniël de Kok 2024-07-18 14:41:08 +0000
  • 2264773127
    should Adrien 2024-07-18 16:33:38 +0200
  • fd021e5461 refactor Stream impl for Generation to factorise code Morgan Funtowicz 2024-07-18 14:21:43 +0000
  • 1d1b1efa01
    fix(server): fix cohere (#2249) OlivierDehaene 2024-07-18 14:00:13 +0000
  • e952e4cfd3
    fix(server): fix cohere OlivierDehaene 2024-07-18 15:59:36 +0200
  • a76aed4f72
    add debug Adrien 2024-07-18 15:36:53 +0200
  • b56c43ec30 remove unneeded scope variable for now Morgan Funtowicz 2024-07-18 12:57:10 +0000
  • e77eff6fa1
    wip debug Adrien 2024-07-18 14:20:37 +0200
  • 47d0863680 update according to review comment Wang, Yi A 2024-07-18 01:20:43 -0700
  • 91a8972e18
    fix: Fix to allow report for a full failed test Hugo Larcher 2024-07-18 10:03:05 +0200
  • d8e3a27648 fix crash in multi-modal Wang, Yi A 2024-07-17 22:21:11 -0700
  • 35f8a88db5 fix: adjust default tool choice drbh 2024-07-18 00:57:02 +0000
  • 0212b1774a correctly forward back the log probabilities Morgan Funtowicz 2024-07-17 22:33:10 +0000
  • bcb96feea6 update invalid doc in cpp file Morgan Funtowicz 2024-07-17 22:23:22 +0000
  • 69674a3a2d add all the necessary plumbery to return the generated content Morgan Funtowicz 2024-07-17 22:12:49 +0000
  • ce715c76f8 remove unnecessary log Morgan Funtowicz 2024-07-17 22:09:50 +0000
  • e983ee5bb8 make sure the context is not dropped in the middle of the async decoding. Morgan Funtowicz 2024-07-17 21:56:50 +0000
  • 6589b5222e
    wip Adrien 2024-07-17 22:35:52 +0200
  • aa0b19ea8d
    wip Adrien 2024-07-17 22:33:04 +0200
  • 65dc1b9fbe
    debug Adrien 2024-07-17 22:21:10 +0200
  • 7d081e1b36
    debug Adrien 2024-07-17 22:19:19 +0200
  • abb843d697
    fix name Adrien 2024-07-17 21:52:55 +0200
  • 2a5a2e9923
    re-push to internal registry Adrien 2024-07-17 21:04:22 +0200
  • cf70a3036b
    Merge remote-tracking branch 'origin/feat/add-load-test' into feat/add-load-test Hugo Larcher 2024-07-17 16:47:07 +0200