mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-05-07 02:12:08 +00:00
There's currently a discrepancy in the tokenization between the router and python server code. The latter includes special tokens but former does not. This results in a token count mismatch for seq2seq models such as mt0 where the tokenizer emits an EOS token at the end. This in turn results in some unexpected/incorrect output, in particular when batch concatenation is involved, because the python code uses the input length passed from the router for each row. As far as I can tell, it is better to include this token in the encoder `input_ids`, so I guess it's best to just adjust on the router side. |
||
---|---|---|
.. | ||
client | ||
src | ||
Cargo.toml |