text-generation-inference/backends/v3
drbh 6f2a468a64 Pr 2352 ci branch (#2382)
* Fix unsigned integer underflow

Passing --max-batch-size to the launcher actually had no effect
because after a few requests the max_size passed to State::next_batch
would underflow becoming a largo positive number.

In the scheduler, as soon as the cached batch size reached the
max_batch_size the max_size passed to next_batch becomes 0.
Since the only check in that funcion is
```
if Some(batch_requests.len()) == max_size {
    break;
}
```
and it's called after the `batch_requests.len()` has
become 1, it doesn't do anything to prevent more than 0
requests from being batched.

Now we have cached batch in the server that is large than
max_batch_size and `max_size - batch_size as usize`
underflows.

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

* fix: update v3 scheduler and ensure max_batch_size > 0

---------

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
2024-09-25 06:01:59 +00:00
..
src Pr 2352 ci branch (#2382) 2024-09-25 06:01:59 +00:00
build.rs Rebase TRT-llm (#2331) 2024-09-25 05:55:39 +00:00
Cargo.toml Rebase TRT-llm (#2331) 2024-09-25 05:55:39 +00:00