.. |
merges
|
Enable multiple LoRa adapters (#2010)
|
2024-09-24 03:55:04 +00:00 |
__init__.py
|
Aligin the source code with main branch 2.0.4
|
2024-09-24 03:06:55 +00:00 |
adapter.py
|
Enable multiple LoRa adapters (#2010)
|
2024-09-24 03:55:04 +00:00 |
chunks.py
|
server: use chunked inputs
|
2024-09-24 03:42:29 +00:00 |
convert.py
|
Force weights_only (before fully breaking pickle files anyway). (#1710)
|
2024-04-25 15:10:53 +03:00 |
dist.py
|
feat(fp8): use fbgemm kernels and load fp8 weights directly (#2248)
|
2024-09-25 05:30:41 +00:00 |
hub.py
|
Enable multiple LoRa adapters (#2010)
|
2024-09-24 03:55:04 +00:00 |
import_utils.py
|
refine get xpu free memory/enable Qwen2/gemma2/gemma/phi in intel platform (#2132)
|
2024-09-24 03:57:32 +00:00 |
log.py
|
feat(fp8): use fbgemm kernels and load fp8 weights directly (#2248)
|
2024-09-25 05:30:41 +00:00 |
logits_process.py
|
Aligin the source code with main branch 2.0.4
|
2024-09-24 03:06:55 +00:00 |
peft.py
|
Enable multiple LoRa adapters (#2010)
|
2024-09-24 03:55:04 +00:00 |
quantization.py
|
feat(fp8): use fbgemm kernels and load fp8 weights directly (#2248)
|
2024-09-25 05:30:41 +00:00 |
segments.py
|
Enable multiple LoRa adapters (#2010)
|
2024-09-24 03:55:04 +00:00 |
sgmv.py
|
Enable multiple LoRa adapters (#2010)
|
2024-09-24 03:55:04 +00:00 |
speculate.py
|
chore: formatting
|
2024-04-18 16:26:00 +03:00 |
tokens.py
|
Aligin the source code with main branch 2.0.4
|
2024-09-24 03:06:55 +00:00 |
watermark.py
|
Aligin the source code with main branch 2.0.4
|
2024-09-24 03:06:55 +00:00 |
weights.py
|
fix(server): fix fp8 weight loading (#2268)
|
2024-09-25 05:31:08 +00:00 |