mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-21 23:12:07 +00:00
Add support for Phi-3-medium The main difference between the medium and mini models is that medium uses grouped query attention with a packed QKV matrix. This change adds support for GQA with packed matrixes to `Weights.get_weights_col_packed` and uses it for Phi-3. This also allows us to remove the custom implementation of GQA from dbrx attention loading. |
||
---|---|---|
.. | ||
__init__.py | ||
chunks.py | ||
convert.py | ||
dist.py | ||
hub.py | ||
import_utils.py | ||
log.py | ||
logits_process.py | ||
peft.py | ||
speculate.py | ||
tokens.py | ||
watermark.py | ||
weights.py |