mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-22 15:32:08 +00:00
This PR adds basic modeling for phi-2 run ```bash text-generation-server \ serve \ microsoft/phi-2 \ --revision 834565c23f9b28b96ccbeabe614dd906b6db551a ``` test ```bash curl -s localhost:3000/generate \ -X POST \ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ -H 'Content-Type: application/json' | jq . ``` notes - recently (~1 day ago) the Phi weights and model were updated to accommodate adding [GQA/MQA attention to the model.](https://github.com/huggingface/transformers/pull/28163) This impl expects the original model format so a fixed revision is required at the moment. - this PR only includes a basic implementation of the model and can later be extended for support Flash and Sharded versions as well as make use of better optimization |
||
---|---|---|
.. | ||
__init__.py | ||
bloom_modeling.py | ||
flash_llama_modeling.py | ||
flash_mistral_modeling.py | ||
flash_mixtral_modeling.py | ||
flash_neox_modeling.py | ||
flash_phi_modeling.py | ||
flash_rw_modeling.py | ||
flash_santacoder_modeling.py | ||
idefics_config.py | ||
idefics_image_processing.py | ||
idefics_modeling.py | ||
idefics_perceiver.py | ||
idefics_processing.py | ||
idefics_vision.py | ||
mpt_modeling.py | ||
neox_modeling.py | ||
opt_modeling.py | ||
phi_modeling.py | ||
t5_modeling.py |