Commit Graph

2 Commits

Author SHA1 Message Date
Daniël de Kok
775e5f4c64 MoE Marlin: support desc_act for groupsize != -1 (#2590)
This change uses the updated Marlin MoE kernel from vLLM to support
MoE with activation sorting and groups.
2024-10-25 09:12:03 +00:00
Daniël de Kok
288bcb0027 Add support for GPTQ-quantized MoE models using MoE Marlin (#2557)
This change add support for MoE models that use GPTQ quantization.
Currently only models with the following properties are supported:

- No `desc_act` with tensor parallelism, unless `group_size=-1`.
- No asymmetric quantization.
- No AWQ.
2024-10-25 09:07:52 +00:00