..
attention
Softcapping for gemma2. ( #2273 )
2024-07-22 18:27:10 +02:00
awq
Support AWQ quantization with bias ( #2117 )
2024-06-25 21:09:00 +02:00
gptq
Some small fixes for the Torch 2.4.0 update ( #2304 )
2024-07-25 13:34:44 +02:00
marlin
Split up layers.marlin
into several files ( #2292 )
2024-07-24 16:33:26 +02:00
__init__.py
Enable multiple LoRa adapters ( #2010 )
2024-06-25 14:46:27 -04:00
bnb.py
feat(fp8): use fbgemm kernels and load fp8 weights directly ( #2248 )
2024-07-20 19:02:04 +02:00
conv.py
Refactor layers. ( #1866 )
2024-05-13 12:44:30 +02:00
eetq.py
feat(fp8): use fbgemm kernels and load fp8 weights directly ( #2248 )
2024-07-20 19:02:04 +02:00
exl2.py
Add support for Deepseek V2 ( #2224 )
2024-07-19 17:23:20 +02:00
fp8.py
fix(l4): fix fp8 logic on l4 ( #2277 )
2024-07-23 11:24:29 +02:00
layernorm.py
Removing IPEX_AVAIL. ( #2115 )
2024-06-25 13:20:57 +02:00
linear.py
Improve the handling of quantized weights ( #2250 )
2024-07-19 09:37:39 +02:00
lora.py
fix: refactor adapter weight loading and mapping ( #2193 )
2024-07-24 15:32:14 -04:00
medusa.py
fix: use path inside of speculator config ( #1935 )
2024-05-22 20:46:29 +02:00
mlp.py
MLPSpeculator. ( #1865 )
2024-05-14 12:33:18 +02:00
rotary.py
Add support for Llama 3 rotary embeddings ( #2286 )
2024-07-23 17:18:54 +02:00
speculative.py
MLPSpeculator. ( #1865 )
2024-05-14 12:33:18 +02:00
tensor_parallel.py
Improve the handling of quantized weights ( #2250 )
2024-07-19 09:37:39 +02:00