Commit Graph

  • 1c4c4d6aed Micro optimization. Nicolas Patry 2024-06-06 13:51:40 +0200
  • c7e6ee5be5 marlin: improve build Daniël de Kok 2024-06-06 11:25:56 +0000
  • 4594e6faba Add support for Marlin-quantized models Daniël de Kok 2024-06-05 08:14:40 +0000
  • ecd1cf180d
    Add full commands for supported configs (#150) Karol Damaszke 2024-06-06 11:09:45 +0200
  • 1ac7a112fe Less cache misses ? Nicolas Patry 2024-06-06 10:34:25 +0200
  • cf0d459aaf Revert "Less cache misses on cargo build." Nicolas Patry 2024-06-06 10:33:55 +0200
  • 5aec4154c2 Less cache misses on cargo build. Nicolas Patry 2024-06-06 10:33:01 +0200
  • 0d3cc033ad update vllm commit & fix models using sliding window fxmarty 2024-06-06 07:51:33 +0000
  • 3d30096056 single quote. Nicolas Patry 2024-06-06 09:48:20 +0200
  • a00d00a6e8 If conditional wrong ? Nicolas Patry 2024-06-06 09:46:51 +0200
  • 97aad8930f Fix yaml? Nicolas Patry 2024-06-06 09:23:24 +0200
  • 586d2fbb8a Fusing and parallelizing builds. Nicolas Patry 2024-06-06 09:17:12 +0200
  • 926def0fa6 Merge branch 'main' into xpu_gqa Wang, Yi A 2024-06-05 22:59:43 -0700
  • 97e91e9d9f Using the common cache. Nicolas Patry 2024-06-05 23:04:22 +0200
  • d7ac081b62 No nvme it seems. Nicolas Patry 2024-06-05 19:44:50 +0200
  • 7b55fd72c3 Tailscale for everyone. Nicolas Patry 2024-06-05 19:31:08 +0200
  • 717bafe6ef Everyone gets a fix. Nicolas Patry 2024-06-05 18:55:08 +0200
  • 15d953afcb HF tailscale version. Nicolas Patry 2024-06-05 18:20:11 +0200
  • 1da2437957 fix: remove typos after rebase drbh 2024-06-05 16:20:02 +0000
  • 051b55f3cc Fix. Nicolas Patry 2024-06-05 18:16:05 +0200
  • fc051d4bb0 fix: refactor and encapsulate kserve feat in file drbh 2024-05-28 15:45:57 +0000
  • 7488b982fa fix: cleanup and improve api docs drbh 2024-05-27 03:03:41 +0000
  • 01bd1b2c26 fix: improve infer and simplify drbh 2024-05-23 16:21:42 +0000
  • 0f1c4b12ca fix: refactor and improve types drbh 2024-05-23 16:11:39 +0000
  • 501b7c4436 feat: implement infer endpoint wrapper around generate drbh 2024-05-23 01:00:55 +0000
  • cb69b09a77 feat: add kserve feature and basic routes drbh 2024-05-22 21:13:30 +0000
  • 13901368d4 Internal runner ? Nicolas Patry 2024-06-05 17:50:03 +0200
  • b5311e406c Working CI in AMD? Nicolas Patry 2024-06-05 16:37:46 +0200
  • 8276b8995a Only 1 GPU for now. Nicolas Patry 2024-06-05 16:23:18 +0200
  • c6325ff122 Prune only before starting yourself ? Nicolas Patry 2024-06-05 16:21:57 +0200
  • 9626f8c743 Prune containers to avoid conflict. Nicolas Patry 2024-06-05 16:18:35 +0200
  • efde2563d1 Fix DOCKER_DEVICES. Nicolas Patry 2024-06-05 16:09:07 +0200
  • 3539ea37e2 Making it work ? Nicolas Patry 2024-06-05 15:52:09 +0200
  • a55917fb43 Checking flash gpt2 for starters. Nicolas Patry 2024-06-05 14:58:02 +0200
  • 79ab3690aa CI. Nicolas Patry 2024-06-05 14:45:54 +0200
  • 265f1b2b93 amd-gpu-tgi. Nicolas Patry 2024-06-05 14:43:44 +0200
  • 0de2c559e2 CI for amd? Nicolas Patry 2024-06-05 14:42:26 +0200
  • 7634bed00b Trying to run docker manually. Nicolas Patry 2024-06-05 12:15:44 +0200
  • 6cdbcb4a89 Wat? Nicolas Patry 2024-06-05 11:58:58 +0200
  • b0728a8063 dEBUGGINGGGGGGGG. Nicolas Patry 2024-06-05 11:56:44 +0200
  • cac01940eb Push. Nicolas Patry 2024-06-05 11:42:45 +0200
  • c5ee3bc0cd Wait for SSH. Nicolas Patry 2024-06-05 11:23:25 +0200
  • fdb5b2a21a Trying to ssh into. Nicolas Patry 2024-06-05 11:22:00 +0200
  • 31d164d8e3 Integration test.s Nicolas Patry 2024-06-04 23:35:05 +0000
  • b6494b91c4 Integrations tests not in docker. Nicolas Patry 2024-06-04 23:22:36 +0000
  • c80ef89db6 Yamled. Nicolas Patry 2024-06-04 23:11:09 +0000
  • caf8fa0847 Integration tests for intel ? Nicolas Patry 2024-06-04 23:08:34 +0000
  • 89fc5c8d33 Is our own runner space limited? Nicolas Patry 2024-06-04 22:57:38 +0000
  • 1182dcaaa9 Do we even need a GPU to build ? Nicolas Patry 2024-06-04 22:30:08 +0000
  • 3bd8e8d461 ? Nicolas Patry 2024-06-04 22:08:08 +0000
  • 02a6c8b408 Ez fix. Nicolas Patry 2024-06-04 22:04:35 +0000
  • f6035f2866 Parallel AMD build. Nicolas Patry 2024-06-04 22:03:28 +0000
  • 2a48a10043
    Update __version__ on __init__.py to 0.7.0 (#2017) Andrés Marafioti 2024-06-05 14:51:07 +0200
  • 3f4bcf978c
    Fix GPTQWeight import (#2020) Daniël de Kok 2024-06-05 14:49:15 +0200
  • 1b53c70523 rocm ck flash attn api fix seungrokjung 2024-06-05 12:45:53 +0000
  • 0a94fad79f
    Fixing rocm. (#2021) Nicolas Patry 2024-06-05 14:41:34 +0200
  • 908973ee0e Fixing rocm. Nicolas Patry 2024-06-05 14:38:06 +0200
  • cf8fdef9d3 feat: adjust to load weights support-phi3-small drbh 2024-06-05 11:48:21 +0000
  • 05c1c27fbd Fix GPTQWeight import Daniël de Kok 2024-06-05 11:32:50 +0000
  • 8aece3bd68
    feat: move allocation logic to rust (#1835) OlivierDehaene 2024-06-05 12:18:38 +0200
  • 4cddea94ee
    Update __version__ Andrés Marafioti 2024-06-05 12:11:34 +0200
  • bb37321b9f allow to fix paged attention num blocks set-num-blocks fxmarty 2024-06-05 10:05:04 +0000
  • 258ace7cd3 fix OlivierDehaene 2024-06-05 11:28:33 +0200
  • e2c8307bdd Add support for Marlin-quantized models Daniël de Kok 2024-06-05 08:14:40 +0000
  • 9ffe1f1e67
    Do not initialize scratch space when there are no ExLlamaV2 layers (#2015) Daniël de Kok 2024-06-05 10:45:47 +0200
  • 0777749dd3 Do not initialize scratch space when there are no ExLlamav2 layers Daniël de Kok 2024-06-05 08:28:29 +0000
  • eb6a02a0f1 fix dockerfiles OlivierDehaene 2024-06-05 10:20:44 +0200
  • 751e99cccd update ipex xpu version to support group query attention Wang, Yi A 2024-06-04 18:16:42 -0700
  • 824edf28d7
    Hotfixing make install. (#2008) Nicolas Patry 2024-06-04 23:34:03 +0200
  • fda29856cd Fix. Nicolas Patry 2024-06-04 21:29:29 +0000
  • 698f7cd474 Hotfixing make install. Nicolas Patry 2024-06-04 18:41:09 +0000
  • 8390e251d9
    Making make install work better by default. (#2004) Nicolas Patry 2024-06-04 19:38:46 +0200
  • d14eaacaca
    Support GPTQ models with column-packed up/gate tensor (#2006) Daniël de Kok 2024-06-04 19:37:49 +0200
  • aad8bb32c0 Better error messages on missing or outdated protoc. Nicolas Patry 2024-06-04 16:24:46 +0000
  • 9410a79df4 rebased OlivierDehaene 2024-06-04 15:05:41 +0200
  • 097f7e9b88 Putting back build steps for rocm. Nicolas Patry 2024-06-04 15:53:07 +0000
  • bc925070d3 Put back install integration tests. Nicolas Patry 2024-06-04 15:29:25 +0000
  • b5f7f98dd8 Support GPTQ models with column-packed up/gate tensor Daniël de Kok 2024-06-04 15:16:15 +0000
  • 8a060ae8c0 New location of flash-attn 2.5.8 Nicolas Patry 2024-06-04 14:07:43 +0000
  • 757223b352
    feat: add SchedulerV3 (#1996) OlivierDehaene 2024-06-04 15:56:56 +0200
  • 7c4927482b Put back flash-v2 build step. Nicolas Patry 2024-06-04 13:39:26 +0000
  • 76fef7b1d2 Don't install flahs on the CPU tests. Nicolas Patry 2024-06-04 13:31:43 +0000
  • d841a4900a ?? Nicolas Patry 2024-06-04 13:02:58 +0000
  • 48ff273560 Make flash work on cpu target ? Nicolas Patry 2024-06-04 12:53:50 +0000
  • f1cd046f6b Put back build step. Nicolas Patry 2024-06-04 12:41:21 +0000
  • fec0167a12
    fix: update triton implementation reference (#2002) Emmanuel Ferdman 2024-06-04 15:26:35 +0300
  • 9b52f0e2dc
    Fix Phi-2 with tp>1 (#2003) Daniël de Kok 2024-06-04 14:26:07 +0200
  • 954c9cacd1 split build workflow rocm-ci-build fxmarty 2024-06-04 13:32:26 +0200
  • 4fd7c64793 Making easier local install. Nicolas Patry 2024-06-04 10:15:30 +0000
  • f9c354d120 Fix Phi-2 with tp>1 Daniël de Kok 2024-06-04 08:25:33 +0000
  • bd4be58942
    fix: update triton implementation reference Emmanuel Ferdman 2024-06-04 11:10:21 +0300
  • 98ad2efaa0 revert OlivierDehaene 2024-06-04 09:06:16 +0200
  • 470732761a Env in a step. ci-xpu2 Nicolas Patry 2024-06-03 18:15:53 +0200
  • 86261555ce Double quote. Nicolas Patry 2024-06-03 18:14:02 +0200
  • be0291240c No wrapper. Nicolas Patry 2024-06-03 18:12:10 +0200
  • 73f1822030 Non sccache. Nicolas Patry 2024-06-03 18:11:18 +0200
  • 8a820d882d rebase OlivierDehaene 2024-06-03 18:09:27 +0200
  • 1ba97e1aba Line intervertion. Nicolas Patry 2024-06-03 18:05:06 +0200
  • 877620423d Reattempt. Nicolas Patry 2024-06-03 18:04:01 +0200
  • 0fd65dc954 Fix? Nicolas Patry 2024-06-03 17:41:33 +0200