mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-27 04:52:07 +00:00
# What does this PR do? During the safetensor conversion, duplicate weights are removed. However, which of the duplicates gets removed, differs per checkpoint. In some, like `h2oai/h2ogpt-oig-oasst1-falcon-40b`, the weight `transformer.word_embeddings.weightSafetensor` gets removed. In others, `lm_head.weight` gets removed. Long story long, we need to support both. Originally, |
||
---|---|---|
.. | ||
custom_kernels | ||
exllama_kernels | ||
tests | ||
text_generation_server | ||
.gitignore | ||
Makefile | ||
Makefile-flash-att | ||
Makefile-flash-att-v2 | ||
Makefile-vllm | ||
poetry.lock | ||
pyproject.toml | ||
README.md | ||
requirements.txt |
Text Generation Inference Python gRPC Server
A Python gRPC server for Text Generation Inference
Install
make install
Run
make run-dev