mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-06-19 15:52:08 +00:00
fix: Include special tokens when tokenizing in front-end
There's currently a discrepancy in the tokenization between the router and python server code. The latter includes special tokens but former does not. This results in a token count mismatch for seq2seq models such as mt0 where the tokenizer emits an EOS token at the end. This in turn results in some unexpected/incorrect output, in particular when batch concatenation is involved, because the python code uses the input length passed from the router for each row. As far as I can tell, it is better to include this token in the encoder input_ids, so I guess it's best to just adjust the router logic.
This commit is contained in:
parent
611e21cb13
commit
03a62635b2
@ -131,7 +131,7 @@ fn validation_worker(
|
||||
}
|
||||
|
||||
// Get the number of tokens in the input
|
||||
match tokenizer.encode(request.inputs.clone(), false) {
|
||||
match tokenizer.encode(request.inputs.clone(), true) {
|
||||
Ok(inputs) => {
|
||||
let input_length = inputs.len();
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user