Commit Graph

  • 7e810e7628 fix: update client exports and adjust after rebase drbh 2024-07-19 17:18:33 +0000
  • 80ab61c013 feat: add ruff and resolve issue drbh 2024-07-19 16:02:20 +0000
  • 5d85a958c9
    fix: refactor adapter weight loading and mapping (#2193) drbh 2024-07-24 15:32:14 -0400
  • 631ca319f3
    Update the idefics2 snapshot. Nicolas Patry 2024-07-24 19:11:08 +0200
  • 93d2b9fe9c
    Split up layers.marlin into several files (#2292) Daniël de Kok 2024-07-24 16:33:26 +0200
  • 9f9997b5d4
    convert strings to lowercase for case insensitive comparison KevinDuffy94 2024-07-24 10:32:48 -0400
  • 19e63ffccc
    Fix comment KevinDuffy94 2024-07-24 10:27:27 -0400
  • 75b55efcc7 server quantize: store quantizer config in standard format Daniël de Kok 2024-07-24 13:27:20 +0000
  • 82c7f951f2 fix: comment typo drbh 2024-07-24 13:26:44 +0000
  • 1f3b2aeee4 fix: improve get_model_with_lora_adapters naming drbh 2024-07-24 13:25:24 +0000
  • 5f2e1f0d7e fix: fix missing model id i rocm warmup Islam Almersawi 2024-07-24 17:22:44 +0400
  • fbb683fce7 fix aliases matvey-kolbasov-hs 2024-07-24 16:05:38 +0300
  • eabcb2967a logging matvey-kolbasov-hs 2024-07-24 15:37:16 +0300
  • 48315e2608 clean up a bit Morgan Funtowicz 2024-07-24 09:52:38 +0000
  • 9c60c9ca43 add missing dependant libraries for linking Morgan Funtowicz 2024-07-24 09:29:24 +0000
  • f73f57ca21 tied embeddings for qwe2 matvey-kolbasov-hs 2024-07-24 11:59:56 +0300
  • 8642250602
    fix of use of unquantized weights in cohere GQA loading, also enable … (#2291) Wang, Yi 2024-07-24 16:44:02 +0800
  • 272e6f987f Split up layers.marlin into several files Daniël de Kok 2024-07-24 08:22:54 +0000
  • 5ad39dd3c3
    fix crash in multi-modal (#2245) Wang, Yi 2024-07-24 16:39:08 +0800
  • 09bcca6a97 update build.rs to link to cuda 12.5 Morgan Funtowicz 2024-07-24 07:50:26 +0000
  • 0c651ac7be fix of use of unquantized weights in cohere GQA loading, also enable the model in intel platform Wang, Yi A 2024-07-23 23:07:21 -0700
  • 02b0eaaba0 MODEL_ID propagation fix root 2024-07-24 03:35:53 +0000
  • e4fc0ebcbe update TensorRT install script to latest Morgan Funtowicz 2024-07-23 22:23:30 +0000
  • 03935f6705 update TensorRT-LLM to latest version Morgan Funtowicz 2024-07-23 22:13:02 +0000
  • ef1876346c refactor the compute capabilities detection along with num gpus Morgan Funtowicz 2024-07-23 22:12:42 +0000
  • a895029424
    hotfix: update nccl OlivierDehaene 2024-07-23 23:31:28 +0200
  • 344427b6ab
    feat(router): drop permit after batching feat/max_queue_size OlivierDehaene 2024-07-23 14:45:30 +0200
  • e7e3aa6cac
    chore: update to torch 2.4 (#2259) OlivierDehaene 2024-07-23 20:39:43 +0000
  • 9491c155bb
    fix OlivierDehaene 2024-07-23 21:40:42 +0200
  • 0c7910f7bc
    remove un-necessary patch OlivierDehaene 2024-07-23 19:40:10 +0200
  • 0e527ae106
    chore: update to torch 2.4 OlivierDehaene 2024-07-19 15:15:18 +0200
  • d3fc28ebe7 no-repeat-ngram is processor not warper Nathan Brake 2024-07-23 12:44:50 -0400
  • db7e043ded New version. v2.2.0 git_v2.2.0 Nicolas Patry 2024-07-23 17:25:18 +0200
  • bc9593a5b1
    hotfix: pin numpy (#2289) Daniël de Kok 2024-07-23 17:53:19 +0200
  • c370636cba hotfix: pin numpy Daniël de Kok 2024-07-23 15:52:28 +0000
  • 56a695cdb9 New version. Nicolas Patry 2024-07-23 17:25:18 +0200
  • 4ab4173767
    Add support for Llama 3 rotary embeddings (#2286) Daniël de Kok 2024-07-23 17:18:54 +0200
  • e665bea857 Update transformers to 4.43 Daniël de Kok 2024-07-23 15:15:01 +0000
  • 2c3b078911 Add support for Llama 3 rotary embeddings Daniël de Kok 2024-07-23 14:34:56 +0000
  • 5d121a9705
    Preparing for release. (#2285) Nicolas Patry 2024-07-23 16:20:17 +0200
  • fa470bc851
    Fixing token within the docker image for the launcher. Nicolas Patry 2024-07-23 14:03:01 +0000
  • dc05d7ba23
    Updating docs. Nicolas Patry 2024-07-23 13:42:32 +0000
  • 4d980942de
    Preparing for release. Nicolas Patry 2024-07-23 13:26:30 +0000
  • 0c95f7a942 Debug softcap flash decoding activation debug/gemma2 Daniël de Kok 2024-07-23 13:12:19 +0000
  • 3961e32390
    [WIP] Add support for Mistral-Nemo by supporting head_dim through config (#2254) shaltielshmid 2024-07-23 16:00:07 +0300
  • 32cc60f329
    Shorter diff. Nicolas Patry 2024-07-23 12:59:35 +0000
  • ab62312d8c
    Using head_dim as a fallback is necessary since it's a non standard key in mistralConfig (as defined in transformers). Nicolas Patry 2024-07-23 12:56:37 +0000
  • 9935720c87
    Add support for repacking AWQ weights for GPTQ-Marlin (#2278) Daniël de Kok 2024-07-23 13:08:20 +0200
  • f0a5cb6c4e Merge branch 'main' into add-mistral-nemo Shaltiel Shmidman 2024-07-23 12:44:56 +0300
  • 712729bc78 Enable Marlin for supported AWQ configurations by default Daniël de Kok 2024-07-23 09:31:36 +0000
  • dee649c60c Chore: Fix naming issues regarding head_size, there can only be one. fix_mistral2 Nicolas Patry 2024-07-23 11:26:53 +0200
  • 5fca30ee15
    fix(l4): fix fp8 logic on l4 (#2277) OlivierDehaene 2024-07-23 09:24:29 +0000
  • abc32537ea
    Fixing mistral nemo. (#2276) Nicolas Patry 2024-07-23 11:16:03 +0200
  • aa2cf4e8ee Using g6 instead of g5. Nicolas Patry 2024-07-23 11:07:35 +0200
  • 33cb2cefed quick fix erikkaum 2024-07-23 11:04:04 +0200
  • 025f80dfd4
    use marlin even on 89 OlivierDehaene 2024-07-23 10:35:32 +0200
  • 3c39ab5ac8 fix typo Morgan Funtowicz 2024-07-23 08:11:36 +0000
  • 4c657ca158 make docker linter happy with same capitalization rule Morgan Funtowicz 2024-07-23 07:42:31 +0000
  • d9decb4c2c move to TensorRT-LLM v0.11.0 Morgan Funtowicz 2024-07-23 07:35:00 +0000
  • ff151b738b refactored docker image Morgan Funtowicz 2024-07-23 07:34:40 +0000
  • 3db1be412c commenting out Python part for TensorRT installation Morgan Funtowicz 2024-07-23 07:27:34 +0000
  • 10448ea8c9 added tgi to name of metric Edwinhr716 2024-07-22 20:38:07 +0000
  • 4700465192
    use proper name for ci (#2274) Adrien 2024-07-22 21:50:53 +0200
  • 805e584b92 update tgi entrypoint Morgan Funtowicz 2024-07-22 19:13:01 +0000
  • 85baa5da89 adding max_token_capacity_metric Edwinhr716 2024-07-22 18:21:34 +0000
  • 32794b1caa Add support for repacking AWQ weights for GPTQ-Marlin Daniël de Kok 2024-07-22 17:39:38 +0000
  • 473f968a01
    also quant weights with single scale OlivierDehaene 2024-07-22 18:49:10 +0200
  • 3d0c7b85fe
    fix(l4): fix fp8 logic on l4 OlivierDehaene 2024-07-22 18:45:26 +0200
  • 4d3936ea32
    Fixing mistral nemo. Nicolas Patry 2024-07-22 16:36:19 +0000
  • 6aeb669072
    Softcapping for gemma2. (#2273) Nicolas Patry 2024-07-22 18:27:10 +0200
  • 5266f15ae1
    0.0 is the null value in the C++ API. Nicolas Patry 2024-07-22 15:59:09 +0000
  • 4844ff790a
    fix(server): fix fp8 weight loading (#2268) OlivierDehaene 2024-07-22 15:51:32 +0000
  • d0a34a95f2 adding missing ld_library_path for cuda stubs in Dockerfile Morgan Funtowicz 2024-07-22 15:16:39 +0000
  • 3fd2bb70c3 fix missing / before tgi lib path Morgan Funtowicz 2024-07-22 14:57:03 +0000
  • a32ef3b875 correctly setup linking search path for runtime layer Morgan Funtowicz 2024-07-22 14:42:43 +0000
  • c813d64a90
    missing group Adrien 2024-07-22 16:34:09 +0200
  • d2009e2262
    use proper name for ci Adrien 2024-07-22 16:31:17 +0200
  • fd06ca6e7e add missing pkgconfig folder for MPI in Dockerfile Morgan Funtowicz 2024-07-22 14:19:51 +0000
  • 40330c73f0 align all the linker search dependency Morgan Funtowicz 2024-07-22 14:14:57 +0000
  • 6d8e3659a9
    revert default dtype OlivierDehaene 2024-07-22 16:13:53 +0200
  • c4b78bd214
    No access to transformers config, only config_dict here. Nicolas Patry 2024-07-22 13:54:17 +0000
  • 5829b7821e
    Less clutter. Nicolas Patry 2024-07-22 13:49:24 +0000
  • 59022c22b4 fix: impove adapter merge comments and remove unused conditional drbh 2024-07-18 18:20:51 +0000
  • d27131bfa8 fix: improve logging and rebase syntax issue drbh 2024-07-15 20:40:39 +0000
  • 5ec88a1b51 feat: improve weight loading and add tests drbh 2024-07-15 14:32:06 +0000
  • 8c3530f705 fix: adjust launcher for local lora adapters drbh 2024-07-09 02:47:47 +0000
  • 4b569341e6 feat: enable lora load from directory drbh 2024-07-09 02:38:50 +0000
  • 70dc958fb8 fix: refactor adapter weight loading and mapping drbh 2024-07-05 15:00:36 +0000
  • 620416f13f
    Softcapping for gemma2. Nicolas Patry 2024-07-22 13:06:03 +0000
  • 0d68619efa
    update snap OlivierDehaene 2024-07-22 15:03:42 +0200
  • 74f1f6a702
    fixed scales loading OlivierDehaene 2024-07-22 13:56:12 +0200
  • 119918cc0a
    fix(server): fix fp8 weight loading OlivierDehaene 2024-07-21 20:56:54 +0200
  • 6aebf44f47
    fix(ci): test new instances (#2272) Adrien 2024-07-22 14:41:30 +0200
  • 1c8b78ae99
    improve build ci Adrien 2024-07-22 14:34:45 +0200
  • 56e7f5d779
    test new instances Adrien 2024-07-22 14:09:19 +0200
  • 6a9e925ec1 fix bad copy/past missing nvinfer linkage direction Morgan Funtowicz 2024-07-22 11:43:10 +0000
  • 3597beefe2 leverage pkg-config to probe libraries paths and reuse new install structure from cmake Morgan Funtowicz 2024-07-22 11:39:11 +0000
  • 2aac2ff2cd do the same name definition stuff for tensorrt_llm_executor_static Morgan Funtowicz 2024-07-22 11:32:54 +0000
  • da079df4cd simplify prebuilt trtllm libraries name definition Morgan Funtowicz 2024-07-22 11:32:31 +0000
  • 07441f5a7a
    legacy warning on text_generation client (#2271) Erik Kaunismäki 2024-07-22 12:00:17 +0200