mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-22 15:32:08 +00:00
This PR adds basic modeling for phi-2 run ```bash text-generation-server \ serve \ microsoft/phi-2 \ --revision 834565c23f9b28b96ccbeabe614dd906b6db551a ``` test ```bash curl -s localhost:3000/generate \ -X POST \ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ -H 'Content-Type: application/json' | jq . ``` notes - recently (~1 day ago) the Phi weights and model were updated to accommodate adding [GQA/MQA attention to the model.](https://github.com/huggingface/transformers/pull/28163) This impl expects the original model format so a fixed revision is required at the moment. - this PR only includes a basic implementation of the model and can later be extended for support Flash and Sharded versions as well as make use of better optimization |
||
---|---|---|
.. | ||
__snapshots__ | ||
test_bloom_560m_sharded.py | ||
test_bloom_560m.py | ||
test_flash_awq_sharded.py | ||
test_flash_awq.py | ||
test_flash_falcon.py | ||
test_flash_llama_gptq.py | ||
test_flash_llama.py | ||
test_flash_medusa.py | ||
test_flash_mistral.py | ||
test_flash_neox_sharded.py | ||
test_flash_neox.py | ||
test_flash_phi.py | ||
test_flash_santacoder.py | ||
test_flash_starcoder_gptq.py | ||
test_flash_starcoder.py | ||
test_idefics.py | ||
test_mpt.py | ||
test_mt0_base.py | ||
test_neox_sharded.py | ||
test_neox.py | ||
test_t5_sharded.py |