mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-25 20:12:07 +00:00
This PR adds basic modeling for phi-2 run ```bash text-generation-server \ serve \ microsoft/phi-2 \ --revision 834565c23f9b28b96ccbeabe614dd906b6db551a ``` test ```bash curl -s localhost:3000/generate \ -X POST \ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ -H 'Content-Type: application/json' | jq . ``` notes - recently (~1 day ago) the Phi weights and model were updated to accommodate adding [GQA/MQA attention to the model.](https://github.com/huggingface/transformers/pull/28163) This impl expects the original model format so a fixed revision is required at the moment. - this PR only includes a basic implementation of the model and can later be extended for support Flash and Sharded versions as well as make use of better optimization |
||
---|---|---|
.. | ||
test_bloom_560m | ||
test_bloom_560m_sharded | ||
test_flash_awq | ||
test_flash_awq_sharded | ||
test_flash_falcon | ||
test_flash_llama | ||
test_flash_llama_gptq | ||
test_flash_medusa | ||
test_flash_mistral | ||
test_flash_neox | ||
test_flash_neox_sharded | ||
test_flash_phi | ||
test_flash_santacoder | ||
test_flash_starcoder | ||
test_flash_starcoder_gptq | ||
test_idefics | ||
test_mpt | ||
test_mt0_base | ||
test_neox | ||
test_neox_sharded | ||
test_t5_sharded |