..
attention
Softcapping for gemma2. ( #2273 )
2024-09-25 05:31:08 +00:00
awq
Support AWQ quantization with bias ( #2117 )
2024-09-24 03:55:04 +00:00
gptq
Some small fixes for the Torch 2.4.0 update ( #2304 )
2024-09-25 05:40:25 +00:00
marlin
Split up layers.marlin
into several files ( #2292 )
2024-09-25 05:39:58 +00:00
__init__.py
Enable multiple LoRa adapters ( #2010 )
2024-09-24 03:55:04 +00:00
bnb.py
feat(fp8): use fbgemm kernels and load fp8 weights directly ( #2248 )
2024-09-25 05:30:41 +00:00
conv.py
Refactor layers. ( #1866 )
2024-07-17 05:36:58 +00:00
eetq.py
feat(fp8): use fbgemm kernels and load fp8 weights directly ( #2248 )
2024-09-25 05:30:41 +00:00
exl2.py
Add support for Deepseek V2 ( #2224 )
2024-09-25 05:27:40 +00:00
fp8.py
fix(l4): fix fp8 logic on l4 ( #2277 )
2024-09-25 05:31:30 +00:00
layernorm.py
Removing IPEX_AVAIL. ( #2115 )
2024-09-24 03:52:23 +00:00
linear.py
Improve the handling of quantized weights ( #2250 )
2024-09-25 05:27:40 +00:00
lora.py
fix: refactor adapter weight loading and mapping ( #2193 )
2024-09-25 05:39:58 +00:00
medusa.py
fix: use path inside of speculator config ( #1935 )
2024-07-17 05:36:58 +00:00
mlp.py
MLPSpeculator. ( #1865 )
2024-07-17 05:36:58 +00:00
rotary.py
Add support for Llama 3 rotary embeddings ( #2286 )
2024-09-25 05:38:48 +00:00
speculative.py
MLPSpeculator. ( #1865 )
2024-07-17 05:36:58 +00:00
tensor_parallel.py
Improve the handling of quantized weights ( #2250 )
2024-09-25 05:27:40 +00:00