mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-22 15:32:08 +00:00
This PR adds basic modeling for phi-2 run ```bash text-generation-server \ serve \ microsoft/phi-2 \ --revision 834565c23f9b28b96ccbeabe614dd906b6db551a ``` test ```bash curl -s localhost:3000/generate \ -X POST \ -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \ -H 'Content-Type: application/json' | jq . ``` notes - recently (~1 day ago) the Phi weights and model were updated to accommodate adding [GQA/MQA attention to the model.](https://github.com/huggingface/transformers/pull/28163) This impl expects the original model format so a fixed revision is required at the moment. - this PR only includes a basic implementation of the model and can later be extended for support Flash and Sharded versions as well as make use of better optimization |
||
---|---|---|
.. | ||
custom_kernels | ||
exllama_kernels | ||
exllamav2_kernels | ||
tests | ||
text_generation_server | ||
.gitignore | ||
Makefile | ||
Makefile-awq | ||
Makefile-eetq | ||
Makefile-flash-att | ||
Makefile-flash-att-v2 | ||
Makefile-vllm | ||
poetry.lock | ||
pyproject.toml | ||
README.md | ||
requirements_common.txt | ||
requirements_cuda.txt | ||
requirements_rocm.txt | ||
requirements.txt |
Text Generation Inference Python gRPC Server
A Python gRPC server for Text Generation Inference
Install
make install
Run
make run-dev