Commit Graph

  • bcfcd4740a
    Fixing prom leak by upgrading. (#2129) Nicolas Patry 2024-06-27 08:08:43 +0200
  • 84f8f72f2a Fixing prom leak by upgrading. Nicolas Patry 2024-06-27 06:05:01 +0000
  • 29a1137409 feat: use model name as adapter id in chat endpoints drbh 2024-06-26 23:21:39 +0000
  • e4ca7e965c Adding build.rs to download and build TRTLLM c++ lib Morgan Funtowicz 2024-06-26 23:22:07 +0200
  • da9456d00a Initial TRTLLM backend structure Morgan Funtowicz 2024-06-26 23:21:43 +0200
  • 60a96a9ae3 do not use private registry in cleanup cache step Felix Marty 2024-06-26 13:57:05 +0000
  • 2bcc87bb02 add dummy backend feat/backend_abstraction OlivierDehaene 2024-06-26 15:39:28 +0200
  • c45551cfc4
    Using new cache. ci2 Nicolas Patry 2024-06-26 15:21:03 +0200
  • 0dcf31a749 Fixing gemma2. temp_work Nicolas Patry 2024-06-26 13:02:56 +0000
  • 230f2a415a refacto OlivierDehaene 2024-06-26 14:12:01 +0200
  • 93e0a7de8b refacto OlivierDehaene 2024-06-26 14:00:03 +0200
  • b562680be4 wip OlivierDehaene 2024-06-26 13:13:32 +0200
  • 4067fc8211 login to registry Felix Marty 2024-06-26 10:58:52 +0000
  • 2330052aa2 debug Felix Marty 2024-06-26 10:43:57 +0000
  • 504754861f wip OlivierDehaene 2024-06-26 12:08:56 +0200
  • 227f78f3fe Merge branch 'main' into ci_amd3 fxmarty 2024-06-26 12:08:42 +0200
  • b44097a61b fix cache cleanup Felix Marty 2024-06-26 10:02:45 +0000
  • bf84d5559f
    fix AttributeError: 'MixtralLayer' object has no attribute 'mlp' icyboy™ 2024-06-26 17:25:13 +0800
  • 7947c347b7 exl2 phi does not use packed QKV/gate-up projections bugfix/phi-exl2 Daniël de Kok 2024-06-26 10:38:08 +0200
  • be2d38032a
    fix: simplify kserve endpoint and fix imports (#2119) drbh 2024-06-25 19:30:10 -0400
  • 326cbd849e fix: simplify kserve endpoint and fix imports drbh 2024-06-25 21:16:27 +0000
  • f1f98e369f
    Add support for Marlin 2:4 sparsity (#2102) Daniël de Kok 2024-06-25 21:09:42 +0200
  • 14980df2df
    Support AWQ quantization with bias (#2117) Daniël de Kok 2024-06-25 21:09:00 +0200
  • 04e1af94d7
    Enable multiple LoRa adapters (#2010) drbh 2024-06-25 14:46:27 -0400
  • 59575fe62a
    Merge branch 'main' into lora-internal drbh 2024-06-25 12:23:04 -0400
  • a2d821c4a3 fix: exit early if no adapter_data drbh 2024-06-25 16:20:33 +0000
  • bf4db77103 updated doc Mohit Sharma 2024-06-25 16:15:03 +0000
  • a2a97b05d6
    Fix CI . (#2118) Nicolas Patry 2024-06-25 17:53:36 +0200
  • 5e38d3534c update launcher Mohit Sharma 2024-06-25 15:45:04 +0000
  • 15b351b4a9 updated doc Mohit Sharma 2024-06-25 15:35:49 +0000
  • aed7d351fa Fix clippy. Nicolas Patry 2024-06-25 15:28:16 +0000
  • 4cd5d82de4 Support AWQ quantization with bias Daniël de Kok 2024-06-25 16:55:43 +0200
  • 1e6e7db02e add AMMO example Mohit Sharma 2024-06-25 14:58:45 +0000
  • fc9c3153e5
    Add pytest release marker (#2114) Daniël de Kok 2024-06-25 16:53:20 +0200
  • e563983d90
    fix cpu and xpu issue (#2116) Wang, Yi 2024-06-25 22:47:06 +0800
  • 3da4ecf8cb fix cpu and xpu issue Wang, Yi A 2024-06-25 06:13:22 -0700
  • a1695ce48b set LD_PRELOAD fxmarty 2024-06-25 14:28:18 +0000
  • a7909e6f94 add torch dtype Mohit Sharma 2024-06-25 14:12:29 +0000
  • 2706cca756 Mark many models as release to speed up CI Daniël de Kok 2024-06-25 13:02:51 +0200
  • 9e2fdf57c0
    Removing IPEX_AVAIL. (#2115) Nicolas Patry 2024-06-25 13:20:57 +0200
  • 54267a322d HF_TOKEN Nicolas Patry 2024-06-25 11:14:13 +0000
  • e9883b2037 Fixing HF_TOKEN. Nicolas Patry 2024-06-25 11:04:35 +0000
  • 70fc0b1604 Unrelated change. Nicolas Patry 2024-06-25 11:02:00 +0000
  • 6683e8419a Forgot a few places. Nicolas Patry 2024-06-25 10:43:38 +0000
  • 1ca91a2ff5 Removing IPEX_AVAIL. Nicolas Patry 2024-06-25 10:38:33 +0000
  • 3f3b7ffd67
    feat: add simple tests for weights (#2092) drbh 2024-06-25 06:22:59 -0400
  • b64c70c9e7
    Cpu tgi (#1936) Wang, Yi 2024-06-25 18:21:29 +0800
  • 32a5ea3282 Add pytest release marker Daniël de Kok 2024-06-25 11:53:55 +0200
  • 04298e5799 add back credentials Felix Marty 2024-06-25 09:22:49 +0000
  • b06dda9224 Add support for Marlin 2:4 sparsity Daniël de Kok 2024-06-21 14:09:59 +0200
  • dc53846456 Merge branch 'main' into ci_amd3 fxmarty 2024-06-25 11:20:00 +0200
  • b69f078041
    fix ChatCompletion and ChatCompletionChunk object string not compatible with standard openai api (#2089) sunxichen 2024-06-25 16:59:50 +0800
  • 83634dc122
    use xpu-smi to dump used memory (#2047) Wang, Yi 2024-06-25 16:15:46 +0800
  • 5b2155b0f8
    corrected Pydantic warning. (#2095) Jeff 2024-06-25 04:10:32 -0400
  • 4b8d150ceb
    Update clients/python/text_generation/types.py Nicolas Patry 2024-06-25 10:10:26 +0200
  • 1869ee2f57
    Add OTLP Service Name Environment Variable (#2076) KevinDuffy94 2024-06-25 08:33:01 +0100
  • 3447c722fd
    Support HF_TOKEN environment variable (#2066) Lucain 2024-06-25 09:23:12 +0200
  • d346027416
    Load test. Nicolas Patry 2024-06-25 09:21:14 +0200
  • f4714a8f98 remove example Mohit Sharma 2024-06-25 07:08:37 +0000
  • 0d496baaa4
    Merge branch 'main' into lora-internal drbh 2024-06-24 18:43:52 -0400
  • f94f2b3e6d fix: refactor and move shard_lora_weights logic drbh 2024-06-24 22:41:28 +0000
  • c927cffbf7 fix: add noop in TensorParallelAdapterRowLinear too drbh 2024-06-24 22:06:04 +0000
  • 3c9b28eaec fix: refactors and helpful comments drbh 2024-06-24 22:01:41 +0000
  • 16f619c5d1
    Support HF_TOKEN environement variable Wauplin 2024-06-13 15:50:11 +0200
  • 09a41f2c43
    do not skip workflow on cuda, fix no space left no device Felix Marty 2024-06-24 11:54:09 +0000
  • f16f0ad92b
    do not login to internal registry Felix Marty 2024-06-24 08:04:12 +0000
  • 13bbf6cc5c
    does ci pass without tailscale? Felix Marty 2024-06-21 16:46:38 +0000
  • ee62872d66
    test tailscale independently Felix Marty 2024-06-21 16:28:04 +0000
  • 1bb1a344d7
    retry Felix Marty 2024-06-21 16:11:38 +0000
  • bc2b9b20e2
    trigger ci Felix Marty 2024-06-21 14:17:02 +0000
  • 3464d60d4b
    The handshake operation timed out & hanging Felix Marty 2024-06-21 13:29:32 +0000
  • 284894303a
    remove require_backend decorators on handles, for some reasons fails in github actions Felix Marty 2024-06-21 12:31:08 +0000
  • 7e0f4f25c7
    renamed file Felix Marty 2024-06-21 09:56:09 +0000
  • 393234de9b
    hopefully fix ci Felix Marty 2024-06-21 09:55:58 +0000
  • 67999773f3
    fix workflow Felix Marty 2024-06-20 09:28:10 +0000
  • 5fb8c275c3
    fix style & typo Felix Marty 2024-06-20 09:03:00 +0000
  • e62ac4d63a
    trigger Felix Marty 2024-06-20 08:19:08 +0000
  • df7bb11793
    dial tcp: lookup registry-1.docker.io: i/o timeout fxmarty 2024-06-17 10:20:05 +0000
  • 40b342a12e
    fix space fxmarty 2024-06-17 10:01:17 +0000
  • 3de8f3647b
    fix decorators fxmarty 2024-06-14 07:45:58 +0000
  • 4616c62914
    style fxmarty 2024-06-13 10:57:31 +0000
  • 5b6b257756
    fix gpt2 tests - some weights were not contiguous Felix Marty 2024-06-13 08:09:52 +0000
  • 9e50c117bc
    fix idefics2 tests Felix Marty 2024-06-13 07:09:48 +0000
  • 1846c1c210
    fix tests fxmarty 2024-06-11 13:40:35 +0000
  • 1e10597d0c
    update fxmarty 2024-06-11 11:25:14 +0000
  • 406885638b
    skip exl2 tests on rocm fxmarty 2024-06-11 09:29:08 +0000
  • 5a4b798f98
    fix gptq tests, LLMM1 matrix bound fxmarty 2024-06-11 07:27:14 +0000
  • 49db30a137
    disable marlin tests on rocm/xpu fxmarty 2024-06-10 13:06:11 +0000
  • 405765b18c
    Fix cargo-chef prepare (#2101) ur4t 2024-06-25 00:16:36 +0800
  • 480d3b3304
    New runner. Manual squash. (#2110) Nicolas Patry 2024-06-24 18:08:34 +0200
  • 3ae62304ab revert makefile Mohit Sharma 2024-06-24 15:22:10 +0000
  • 034686b178 update heading Mohit Sharma 2024-06-24 15:20:45 +0000
  • e81c4cf863 update launcher Mohit Sharma 2024-06-24 15:09:17 +0000
  • 3cc2f4e9fa update doc Mohit Sharma 2024-06-24 14:50:16 +0000
  • bf5910fd28
    1.79 Nicolas Patry 2024-06-24 16:43:56 +0200
  • 844dc484ac
    Moving buildx install after tailscale ? Nicolas Patry 2024-06-24 16:38:54 +0200
  • 001ec09df3 rename doc Mohit Sharma 2024-06-24 14:38:30 +0000
  • 50806ffe4a update port Mohit Sharma 2024-06-24 14:37:29 +0000
  • 557e18e08c fix style Mohit Sharma 2024-06-24 14:30:26 +0000
  • 76dee7fa7d
    No network host ? Nicolas Patry 2024-06-24 16:07:24 +0200