Daniël de Kok
775e5f4c64
MoE Marlin: support desc_act
for groupsize != -1
( #2590 )
...
This change uses the updated Marlin MoE kernel from vLLM to support
MoE with activation sorting and groups.
2024-10-25 09:12:03 +00:00
Daniël de Kok
468e5c6874
Handle GPTQ-Marlin loading in GPTQMarlinWeightLoader
( #2300 )
...
The `GPTWeightLoader` was structured like this in pseudocode:
if marlin:
Set up tensors in a way that GPTQ-Marlin expects
else:
Set up tensors in a way that ExLlama/GPTQ/AWQ expect
However, the GPT-Marlin implementation details should really be in the
`marlin` module. So move the former part out to a separate
`GPTQMarlinWeightsLoader`.
2024-09-25 05:55:39 +00:00
Daniël de Kok
23a3927eb6
Install Marlin from standalone package ( #2320 )
2024-09-25 05:50:17 +00:00
drbh
a87791d7c9
feat: add ruff and resolve issue ( #2262 )
...
* feat: add ruff and resolve issue
* fix: update client exports and adjust after rebase
* fix: adjust syntax to avoid circular import
* fix: adjust client ruff settings
* fix: lint and refactor import check and avoid model enum as global names
* fix: improve fbgemm_gpu check and lints
* fix: update lints
* fix: prefer comparing model enum over str
* fix: adjust lints and ignore specific rules
* fix: avoid unneeded quantize check
2024-09-25 05:46:24 +00:00
Daniël de Kok
457791f511
Split up layers.marlin
into several files ( #2292 )
...
The marlin.py file was getting large, split it up.
2024-09-25 05:39:58 +00:00