Commit Graph

  • d13215da8f fix(server): fix deepseekv2 loading (#2266) OlivierDehaene 2024-07-21 16:48:04 +0000
  • 85f10ec5c9 feat(fp8): use fbgemm kernels and load fp8 weights directly (#2248) OlivierDehaene 2024-07-20 17:02:04 +0000
  • 50149c3800 Add FP8 release test (#2261) Daniël de Kok 2024-07-20 12:26:06 +0200
  • c1638a56f1 Add support for Deepseek V2 (#2224) Daniël de Kok 2024-07-19 17:23:20 +0200
  • 898a892082 fix: adjust default tool choice (#2244) drbh 2024-07-19 11:12:02 -0400
  • 8afc17396d add usage stats to toctree (#2260) Erik Kaunismäki 2024-07-19 16:34:04 +0200
  • 66f3de583e usage stats and crash reports (#2220) Erik Kaunismäki 2024-07-19 16:17:56 +0200
  • e658d95c23 Hotfix: pass through model revision in VlmCausalLM (#2258) Daniël de Kok 2024-07-19 15:59:00 +0200
  • 990ea793c0 Hotfix: fix MPT after recent refactor (#2257) Daniël de Kok 2024-07-19 14:42:35 +0200
  • ba0dfb6fb1 Hotfix: various GPT-based model fixes (#2256) Daniël de Kok 2024-07-19 14:42:19 +0200
  • 394f8c7d2b Hotfix: fix of use of unquantized weights in Gemma GQA loading (#2255) Daniël de Kok 2024-07-19 12:55:59 +0200
  • 2dd680b799 Improve the handling of quantized weights (#2250) Daniël de Kok 2024-07-19 09:37:39 +0200
  • 118ee57f82 fix(server): fix cohere (#2249) OlivierDehaene 2024-07-18 14:00:13 +0000
  • e0710ccbeb Remove stray quantize argument in get_weights_col_packed_qkv (#2237) Daniël de Kok 2024-07-16 09:30:57 +0200
  • 7177da0df6 server quantize: expose groupsize option (#2225) Daniël de Kok 2024-07-16 08:36:05 +0200
  • e955f7b536 Add support for AWQ-quantized Idefics2 (#2233) Daniël de Kok 2024-07-16 07:58:25 +0200
  • 8a223eb6ac fix: Remove bitsandbytes installation when running cpu-only install (#2216) Hugo Larcher 2024-07-15 15:34:20 +0200
  • 271ebb7e20 fix custom cache dir (#2226) Erik Kaunismäki 2024-07-15 15:17:13 +0200
  • 619eeded47 feat: simple mistral lora integration tests (#2180) drbh 2024-07-15 09:16:15 -0400
  • ee56266044 Use symmetric quantization in the quantize subcommand (#2120) Daniël de Kok 2024-07-12 12:20:12 +0200
  • dedeb3cfa0 Modifying base in yarn embedding (#2212) SeongBeomLEE 2024-07-12 17:04:51 +0900
  • 5029e7215c fix: append DONE message to chat stream (#2221) drbh 2024-07-11 10:42:58 -0400
  • 85c3c5d64f Add support for FP8 on compute capability >=8.0, <8.9 (#2213) Daniël de Kok 2024-07-11 16:03:26 +0200
  • 2a6c3caf1d Move quantized weight handling out of the Weights class (#2194) Daniël de Kok 2024-07-09 20:04:03 +0200
  • cc4fceb21d Updating the self check (#2209) Nicolas Patry 2024-07-09 17:23:48 +0200
  • 591f9f70eb Adding sanity check to openapi docs. Nicolas Patry 2024-07-09 11:13:48 +0200
  • eaaea91e2b Fix nccl regression on PyTorch 2.3 upgrade (#2099) fxmarty 2024-07-08 17:52:10 +0200
  • 48f1196da8 feat: use model name as adapter id in chat endpoints (#2128) drbh 2024-07-08 10:06:49 -0400
  • 74edda9c23 update to metrics 0.23.0 or could work with metrics-exporter-promethe… (#2190) Wang, Yi 2024-07-08 22:03:59 +0800
  • 4a54e41920 fix: python deserialization (#2178) Javier Martinez 2024-07-08 15:59:16 +0200
  • 8dd9b2b135 add doc for intel gpus (#2181) Wang, Yi 2024-07-08 21:57:06 +0800
  • 540e710c3f Falcon/DBRX: get correct number of key-value heads (#2205) Daniël de Kok 2024-07-08 13:22:38 +0200
  • 17594916ed Fix incorrect cache allocation with multi-query (#2203) Daniël de Kok 2024-07-08 11:19:48 +0200
  • f11fd699b6 hotfix: Fix number of KV heads (#2202) Daniël de Kok 2024-07-08 09:52:12 +0200
  • 8e3d1e6c3f fix dbrx & opt model prefix bug (#2201) icyboy™ 2024-07-08 15:01:14 +0800
  • 508e308088 Consistently take prefix in model constructors (#2191) Daniël de Kok 2024-07-05 16:07:48 +0200
  • 54c194dfa6 GPTQ CI improvements (#2151) Daniël de Kok 2024-07-05 14:12:16 +0200
  • 1e7ce69f20 Fix Starcoder2 after refactor (#2189) Daniël de Kok 2024-07-05 12:22:45 +0200
  • e481a9bb9b Hotfixing after refactor. Nicolas Patry 2024-07-05 09:25:29 +0000
  • 1b434e8019 Refactor dead code - Removing all flash_xxx.py files. (#2166) Nicolas Patry 2024-07-05 10:29:56 +0200
  • 7efcb5e0ed
    remove LORA_ADAPTERS_PATH (#2563) Nicholas Broad 2024-09-24 16:20:15 -0700
  • 11782d367d
    remove LORA_ADAPTERS_PATH Nicholas Broad 2024-09-24 15:29:54 -0700
  • dd8691b7c5
    More tensor cores. (#2558) Nicolas Patry 2024-09-24 23:57:26 +0200
  • c032280b17
    Cleanup Vertex + Chat (#2553) Nicolas Patry 2024-09-24 23:37:17 +0200
  • bb8c38f5fe
    Gemma is modified by this. Nicolas Patry 2024-09-24 22:51:45 +0200
  • d77a31cd95
    Fixing the logic. Nicolas Patry 2024-09-24 14:42:01 +0200
  • 56c630a425
    More tensor cores. Nicolas Patry 2024-09-24 13:51:36 +0200
  • 75c8c54ac9
    Hotfixing main. (#2562) Nicolas Patry 2024-09-24 23:00:43 +0200
  • e2c92a0a07
    Hotfixing main. Nicolas Patry 2024-09-24 22:59:28 +0200
  • c6231ac4c7 feat: enable pytorch xpu support for non-attention models Dmitry Rogozhkin 2024-09-19 16:47:55 -0700
  • ebe33e7dbc
    Fixing the pre-commit after rebase. Nicolas Patry 2024-09-24 22:24:44 +0200
  • 02b25e524d
    Update Cargo lock. Nicolas Patry 2024-09-24 22:14:36 +0200
  • e4397991d2
    Revert everything. Nicolas Patry 2024-09-24 20:23:02 +0200
  • 6a07d1e83c
    Trying smething. Nicolas Patry 2024-09-24 20:18:28 +0200
  • 48b7841a68
    Trying some other install. Nicolas Patry 2024-09-24 20:14:10 +0200
  • 0b029b3c24
    Dummy change. Nicolas Patry 2024-09-24 20:12:12 +0200
  • 6c6f2b5575
    Wat? Nicolas Patry 2024-09-24 20:10:29 +0200
  • 7d219fc2bd
    Updating Cargo ? Nicolas Patry 2024-09-24 20:05:29 +0200
  • 4a29ae2b66
    Not unstable. Nicolas Patry 2024-09-24 20:01:55 +0200
  • 846fcc3447
    Let's debug that. Nicolas Patry 2024-09-24 17:41:34 +0200
  • 259ba29a90
    Fixup doc. Nicolas Patry 2024-09-24 17:35:42 +0200
  • d46d3c65ea
    Changing back this logprobs default. Nicolas Patry 2024-09-24 17:21:07 +0200
  • 6744df5873
    Fix docs. Nicolas Patry 2024-09-24 11:38:59 +0200
  • be00fb7fc0
    Parameters are optional Nicolas Patry 2024-09-24 11:37:58 +0200
  • 507ecae147
    logprobs defaults to false. Nicolas Patry 2024-09-23 22:21:43 +0200
  • 7cc18e85b5
    Cleanup Vertex + Chat Nicolas Patry 2024-09-23 22:00:59 +0200
  • e6d29656b5
    Adding note for private models in quick-tour document (#2548) Aritra Roy Gosthipaty 2024-09-24 18:36:53 +0530
  • 8024ded58f
    Simplify crossterm imports (#2545) Orhun Parmaksız 2024-09-24 15:57:20 +0300
  • 03263f5e88
    Update the link to the Ratatui organization (#2546) Orhun Parmaksız 2024-09-24 15:51:48 +0300
  • 3f14cd1420
    Add DenseMoELayer and wire it up in Mixtral/Deepseek V2 (#2537) Daniël de Kok 2024-09-24 14:27:06 +0200
  • c29dc89c18
    Add support for scalar FP8 weight scales (#2550) Daniël de Kok 2024-09-24 13:57:40 +0200
  • afe3fed1a4 Merge branch 'fix_rocm_fa' into rocm_6.2_fixes tuna rocm_6.2_fixes Mohit Sharma 2024-09-24 10:53:50 +0000
  • 64e981fdcf fix issue for sliding window models Mohit Sharma 2024-09-24 10:53:19 +0000
  • 0ff6ff60ad
    Hotfixing main (#2556) Nicolas Patry 2024-09-24 11:51:14 +0200
  • 94c1b56c44
    Hotfixing main Nicolas Patry 2024-09-24 11:26:19 +0200
  • 74d3ce106e
    Micro cleanup. (#2555) Nicolas Patry 2024-09-24 11:19:24 +0200
  • 67f5051e3d Remove stray debug print Daniël de Kok 2024-09-24 08:31:02 +0000
  • 7c05b0ba54 Support LLM compressor FP8 checkpoints on H100 Daniël de Kok 2024-09-24 08:27:52 +0000
  • d31a6f75cc
    Remove duplicated RUN in Dockerfile (#2547) Alvaro Bartolome 2024-09-24 10:19:13 +0200
  • c68f72e790
    Micro cleanup. Nicolas Patry 2024-09-24 09:54:31 +0200
  • ccaf9ff507 Add support for scalar FP8 weight scales Daniël de Kok 2024-09-23 15:46:41 +0000
  • 10e6f29295
    chore: Add old V2 backend (#2551) OlivierDehaene 2024-09-24 08:38:17 +0200
  • 835ad0a923 Adding "longrope" for Phi-3 (#2172) (#2179) Aaron Mihalik 2024-07-05 03:46:41 -0400
  • 2e09ebecf6 Preparing patch release. (#2186) Nicolas Patry 2024-07-04 10:55:33 +0200
  • 74ddd1265a Version 2.1.1 Nicolas Patry 2024-07-04 12:39:07 +0200
  • e93c830e66 Fixing missing object field for regular completions. (#2175) Nicolas Patry 2024-07-03 12:56:27 +0200
  • 64989f9439 Fixing the dockerfile warnings. (#2173) Nicolas Patry 2024-07-03 12:48:45 +0200
  • 878491cd5b Revert "Fixing missing object field for regular completions." Nicolas Patry 2024-07-03 10:41:39 +0000
  • b6c8984658 Fixing missing object field for regular completions. Nicolas Patry 2024-07-03 10:40:22 +0000
  • 233e46409a feat: improve update_docs for openapi schema (#2169) drbh 2024-07-03 03:53:35 -0400
  • d580215a24 Hotfixing qwen2 and starcoder2 (which also get clamping). (#2167) Nicolas Patry 2024-07-02 14:26:47 +0200
  • bc5a792dc8 Fixing rocm. (#2164) Nicolas Patry 2024-07-02 12:01:08 +0200
  • e913f3ad2d fix: use the base layers weight in mistral rocm (#2155) drbh 2024-07-02 05:56:25 -0400
  • 71b0189cd5 fix FlashDecoding change's regression in intel platform (#2161) Wang, Yi 2024-07-02 17:56:07 +0800
  • 9b3d3a3690 Fixing graph capture for flash decoding. (#2163) Nicolas Patry 2024-07-02 11:43:07 +0200
  • b80bd724e1 Move to FlashDecoding instead of PagedAttention kernel. (#1940) Nicolas Patry 2024-07-01 23:28:00 +0200
  • 2b9339c65b Fixing baichuan override. (#2158) Nicolas Patry 2024-07-01 23:25:54 +0200
  • 381c5c02a6 fix: prefer serde structs over custom functions (#2127) drbh 2024-07-01 09:08:05 -0400
  • 6265956bc4 refine get xpu free memory/enable Qwen2/gemma2/gemma/phi in intel platform (#2132) Wang, Yi 2024-07-01 20:32:54 +0800
  • 5b977c3141 fix AttributeError: 'MixtralLayer' object has no attribute 'mlp' (#2123) icyboy™ 2024-07-01 20:17:22 +0800