* Attempt at automatic max batch prefill.
* Taking into account number of shards.
* Adding more cards.
* Adding A100 + H100
* Adding a few more cards.
* Logprobs cost too much.
* h100 better name, and keep factor of 2
* Damn inflated sparse tflops.
* Typo in h100.
* Updated the flops calculation (checked with fvcore).
* chunking by default.
* Fix prefix caching for chat completion since we removed logprobs.
* More tests.
* Dropping all the prefill logprobs.
* Add a flag that enables users to get logprobs back.
* Repairing prompt token counting.
* Fixing a few tests.
* Remove some scaffolding.
* Attempting to reduces the issues (workarounds for now).
This PR fixes parallel grammar requests, currently grammar states are
not concatenated correctly when a new request is added to the batch and
this results in incorrect generation. This PR updates the `concatenate`
function to correctly include the previous states.
fixes: #1601