mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-05-13 05:22:09 +00:00
This PR fixes parallel grammar requests, currently grammar states are not concatenated correctly when a new request is added to the batch and this results in incorrect generation. This PR updates the `concatenate` function to correctly include the previous states. fixes: #1601 |
||
---|---|---|
.. | ||
test_flash_llama_grammar_json.json | ||
test_flash_llama_grammar_load.json | ||
test_flash_llama_grammar_regex.json | ||
test_flash_llama_grammar_single_load_instance.json | ||
test_flash_llama_grammar.json |