mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-22 15:32:08 +00:00
Update the doc.
This commit is contained in:
parent
b75bd5b720
commit
f66c9f340b
@ -168,7 +168,7 @@ Options:
|
|||||||
## MAX_BATCH_PREFILL_TOKENS
|
## MAX_BATCH_PREFILL_TOKENS
|
||||||
```shell
|
```shell
|
||||||
--max-batch-prefill-tokens <MAX_BATCH_PREFILL_TOKENS>
|
--max-batch-prefill-tokens <MAX_BATCH_PREFILL_TOKENS>
|
||||||
Limits the number of tokens for the prefill operation. Since this operation take the most memory and is compute bound, it is interesting to limit the number of requests that can be sent. Default to `max_input_length + 50` to give a bit of room
|
Limits the number of tokens for the prefill operation. Since this operation take the most memory and is compute bound, it is interesting to limit the number of requests that can be sent. Default to `max_input_tokens + 50` to give a bit of room
|
||||||
|
|
||||||
[env: MAX_BATCH_PREFILL_TOKENS=]
|
[env: MAX_BATCH_PREFILL_TOKENS=]
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user