Commit Graph

  • 3ab578b416
    [docs] Fix link to Install CLI (#1526) Pedro Cuenca 2024-02-02 14:05:30 +0100
  • 3d21618814
    [docs] Fix link to Install CLI Pedro Cuenca 2024-02-02 13:49:18 +0100
  • 4c6c39e491 Upgrading axum=0.7 Nicolas Patry 2024-02-02 12:15:32 +0100
  • bcdb02e41a typing IlyasMoutawwakil 2024-02-01 19:37:02 +0000
  • 02912ad273 feat: add ie update to message docs drbh 2024-02-01 13:36:24 -0500
  • fb59c56215 adapt awq weights to exllama/gptq kernels IlyasMoutawwakil 2024-02-01 18:35:41 +0000
  • 8665ab07ac revert changes IlyasMoutawwakil 2024-02-01 18:35:04 +0000
  • c10c4a023a Github magic? add_readme_dashboard Nicolas Patry 2024-02-01 16:13:37 +0000
  • 62613371b9 Layoyut. Nicolas Patry 2024-02-01 16:09:48 +0000
  • cf10a72a89 ? Nicolas Patry 2024-02-01 16:08:49 +0000
  • ed6af32aee Monioring your deployements. Nicolas Patry 2024-02-01 16:07:59 +0000
  • 15267553c0 Different layout ? Nicolas Patry 2024-02-01 16:05:10 +0000
  • 331652f556 Different inclusion. Nicolas Patry 2024-02-01 15:58:33 +0000
  • 3016997ac5 Adding a dashboard in the README to showcase prod metrics. Nicolas Patry 2024-02-01 15:52:11 +0000
  • 1d15901d24 Update docs2. Nicolas Patry 2024-02-01 15:51:15 +0000
  • 5267fac845 Forgot to update docs. Nicolas Patry 2024-02-01 15:36:21 +0000
  • 3e26028245 Marking the flag as really not the fastest and BETA. Nicolas Patry 2024-02-01 15:30:02 +0000
  • 7b38ca4d7b typo removal. Nicolas Patry 2024-01-25 15:31:39 +0100
  • d90383d96d Updating docs. Nicolas Patry 2024-01-25 14:48:54 +0100
  • 081457f290 Dummy but working version. Nicolas Patry 2024-01-25 13:03:37 +0100
  • a15ceb200c Initial fp8. Nicolas Patry 2024-01-25 11:10:22 +0000
  • 0e97af456a
    Updating tokenizers. (#1517) Nicolas Patry 2024-02-01 16:26:48 +0100
  • ee1cf51ce7
    fix: tokenizer config should use local model path when possible (#1518) drbh 2024-02-01 09:39:32 -0500
  • 1e03b61b5c Revert "Modify default for max_new_tokens in python client (#1336)" revert Nicolas Patry 2024-02-01 14:36:10 +0000
  • 9ad6b570ae fix: simplify logic and remove vars drbh 2024-02-01 14:24:12 +0000
  • 57e27bcbb2
    Update router/src/main.rs drbh 2024-02-01 09:09:21 -0500
  • da3f8a4598
    Update router/src/main.rs drbh 2024-02-01 09:05:42 -0500
  • 6e08f5b265 fix: tokenizer config should use local model path when possible drbh 2024-02-01 13:54:33 +0000
  • 0b5b858779 fix missing g_idx and eventual overflow in triton kernel IlyasMoutawwakil 2024-02-01 13:30:43 +0000
  • 8acbcb31d5 add triton fallback to awq IlyasMoutawwakil 2024-02-01 13:30:13 +0000
  • 54cbe5f120 Updating tokenizers. Nicolas Patry 2024-02-01 14:13:16 +0100
  • b71557d956 add request id in logs debug-request-id Adrien 2024-02-01 13:47:04 +0100
  • 9ad7b6a1a1
    Hotfix the / health - route. (#1515) Nicolas Patry 2024-02-01 13:29:04 +0100
  • eb47e042e2 Hotfix. Nicolas Patry 2024-02-01 13:28:26 +0100
  • 5766c55b7a post process exllama model IlyasMoutawwakil 2024-02-01 12:48:17 +0100
  • 12c1f54525 awq fallback to exllama IlyasMoutawwakil 2024-02-01 12:06:02 +0100
  • 6cb6020e4d fix exllama overflows IlyasMoutawwakil 2024-02-01 12:05:36 +0100
  • 94d243b3d7 Freshen up the README. update_readme Nicolas Patry 2024-02-01 10:23:37 +0100
  • 13c62be467
    GPTNeoX: Use static rotary embedding (#1498) Dean Wyatte 2024-02-01 01:34:11 -0700
  • 2d674624a3 fix: start to add caching of previous states drbh 2024-02-01 05:00:51 +0000
  • 2ae36a97fd
    fix: improve messages api docs content and formatting (#1506) drbh 2024-01-31 11:26:22 -0500
  • 0595bf3e9a
    feat: eetq gemv optimization when batch_size <= 4 (#1502) dtlzhuangz 2024-01-31 19:05:49 +0800
  • a3c45da0a4 Improvments within mamba. mamba2 Nicolas Patry 2024-01-31 10:28:58 +0000
  • 5b6f9259c1 feat: add optimization and first pass of integration test drbh 2024-01-30 18:53:28 +0000
  • 871e5e7338 fix rotary dim fix_neox_rotary_emb Dean Wyatte 2024-01-30 03:53:08 +0000
  • 966f3ba35c feat: use fused kernel in forward pass drbh 2024-01-29 21:54:23 +0000
  • c2681b2bea feat: build custom selective-scan kernels drbh 2024-01-29 20:09:38 +0000
  • 2d56f106a6
    Modify default for max_new_tokens in python client (#1336) freitng 2024-01-29 17:02:57 +0100
  • 970f7142b5 fix: improve messages api docs content and formatting drbh 2024-01-29 14:18:19 +0000
  • a9ea60684b
    Create the compute type at launch time (if not provided in the env). (#1505) Nicolas Patry 2024-01-29 12:30:50 +0100
  • 6ff36feac9 Clippy. Nicolas Patry 2024-01-29 12:21:49 +0100
  • a7a98c0253 Fmt Nicolas Patry 2024-01-29 12:11:57 +0100
  • e19d6f3589 Easier to parse. Nicolas Patry 2024-01-29 12:02:42 +0100
  • 497d1518be Setting the compute_type at launchtime. Nicolas Patry 2024-01-29 11:59:04 +0100
  • 0424dabb01
    Sending compute type from the environment instead of hardcoded string (#1504) Nicolas Patry 2024-01-29 11:20:08 +0100
  • f91fbe9d26 Sending compute type from the environment instead of hardcoded string Nicolas Patry 2024-01-29 11:15:58 +0100
  • 01b2f357b9 feat: eetq gemv optimization when batch_size <= 4 zhuangzhong 2024-01-29 15:06:11 +0800
  • f2a00be169 use static rotary embedding Dean Wyatte 2024-01-26 22:22:55 +0000
  • 069895b985
    Fixing top_n_tokens. (#1497) Nicolas Patry 2024-01-26 20:13:47 +0100
  • 9d3190179e Fixing tests Nicolas Patry 2024-01-26 18:36:51 +0000
  • c2d4a3b5c7
    v1.4.0 (#1494) v1.4.0 OlivierDehaene 2024-01-26 19:04:57 +0100
  • a5600c23af Fix seq2seq. Nicolas Patry 2024-01-26 17:34:38 +0000
  • 0452d590d0 Fixing other types of models + tests + Damn you python scoping. Nicolas Patry 2024-01-26 17:16:26 +0000
  • 6e629add98 Fixing top_n_tokens. Nicolas Patry 2024-01-26 16:38:03 +0000
  • 4d6132a233 remove delete doc comment OlivierDehaene 2024-01-26 18:04:07 +0100
  • d0ddc80c31 fmt OlivierDehaene 2024-01-26 16:31:48 +0100
  • bc04a059c9 v1.4.0 OlivierDehaene 2024-01-26 16:31:33 +0100
  • d9758851be
    feat: add tokenizer-config-path to launcher args (#1495) drbh 2024-01-26 12:01:33 -0500
  • e9e771c3db chore: remove extra space (cargo fmt) drbh 2024-01-26 11:28:04 -0500
  • fc86dba781 fix: ensure latest requirements exported bump-poetry-and-requirements drbh 2024-01-26 11:26:34 -0500
  • 400a3c68c6 feat: bump lock and requirement files drbh 2024-01-26 11:23:49 -0500
  • 6d34bfcef5 fix: replace accidental deletion and fix docs again drbh 2024-01-26 11:11:35 -0500
  • b6cfb1dd75 fix: remove trailing space (cargo fmt) drbh 2024-01-26 11:04:53 -0500
  • 66292828e6 fix: correct comment typo drbh 2024-01-26 10:57:53 -0500
  • f7a8454d43 fix: update docs and move arg position in file drbh 2024-01-26 10:55:35 -0500
  • f660fcf715 feat: add tokenizer-config-path to launcher args drbh 2024-01-26 10:47:17 -0500
  • 650fea1834
    GPTQ support on ROCm (#1489) fxmarty 2024-01-26 16:27:44 +0100
  • 4ee87f41ab update doc OlivierDehaene 2024-01-26 16:27:27 +0100
  • 7d2bc40c42 update doc OlivierDehaene 2024-01-26 16:11:24 +0100
  • 051c9c465f Update Dockerfile_amd fxmarty 2024-01-26 11:20:20 +0100
  • fcee7035c0 update doc Felix Marty 2024-01-25 18:19:15 +0000
  • 059a2d9fa6 clean Felix Marty 2024-01-25 18:14:35 +0000
  • 359dd46474 cleaning bis Felix Marty 2024-01-25 18:06:58 +0000
  • 2909047d2e cleanup Felix Marty 2024-01-25 18:03:43 +0000
  • da002794b2 fix Felix Marty 2024-01-25 16:15:28 +0000
  • 145c2d6d6e more logs Felix Marty 2024-01-25 16:11:00 +0000
  • 3c93b31959 fix Felix Marty 2024-01-25 12:28:56 +0000
  • d8f33e3c2b update torch Felix Marty 2024-01-25 12:05:35 +0000
  • 33111a0e7f wip Felix Marty 2024-01-25 11:59:02 +0000
  • ebecc06161
    Update the docs to include newer models. (#1492) Nicolas Patry 2024-01-26 16:07:31 +0100
  • 52df1f5a37 Fmt Nicolas Patry 2024-01-26 13:20:46 +0000
  • 50a20a83d7
    fix: launcher doc typos (#1462) Andrés Restrepo 2024-01-26 08:10:07 -0500
  • 4c7315dde5
    Trying to fix that flaky test. (#1491) Nicolas Patry 2024-01-26 14:06:27 +0100
  • ac49972752
    Add sealion mpt support (#1477) Nicolas Patry 2024-01-26 14:05:02 +0100
  • b95732180d
    Reinstate exl2 with tp (#1490) Nicolas Patry 2024-01-26 14:00:29 +0100
  • beca6c92f1 Updating the openapi docs. Nicolas Patry 2024-01-26 13:18:05 +0100
  • 29fa60ec3e Trying to fix that flaky test. Nicolas Patry 2024-01-26 11:36:41 +0000
  • a7c475ac3b Updating gptq (exl2 slightly different output) Nicolas Patry 2024-01-26 11:34:00 +0000
  • 29a4baea59 Fixing sealion support. Nicolas Patry 2024-01-26 11:04:18 +0000
  • 7de9141164 Adding a comment. Nicolas Patry 2024-01-26 10:30:01 +0000