mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-22 15:32:08 +00:00
Update default doc.
This commit is contained in:
parent
bd01d448d7
commit
a4c86e8678
@ -149,7 +149,7 @@ Options:
|
||||
## MAX_TOTAL_TOKENS
|
||||
```shell
|
||||
--max-total-tokens <MAX_TOTAL_TOKENS>
|
||||
This is the most important value to set as it defines the "memory budget" of running clients requests. Clients will send input sequences and ask to generate `max_new_tokens` on top. with a value of `1512` users can send either a prompt of `1000` and ask for `512` new tokens, or send a prompt of `1` and ask for `1511` max_new_tokens. The larger this value, the larger amount each request will be in your RAM and the less effective batching can be. Default to min(max_position_embeddings - 1, 16384)
|
||||
This is the most important value to set as it defines the "memory budget" of running clients requests. Clients will send input sequences and ask to generate `max_new_tokens` on top. with a value of `1512` users can send either a prompt of `1000` and ask for `512` new tokens, or send a prompt of `1` and ask for `1511` max_new_tokens. The larger this value, the larger amount each request will be in your RAM and the less effective batching can be. Default to min(max_position_embeddings, 16384)
|
||||
|
||||
[env: MAX_TOTAL_TOKENS=]
|
||||
|
||||
|
@ -230,7 +230,7 @@ struct Args {
|
||||
/// `1511` max_new_tokens.
|
||||
/// The larger this value, the larger amount each request will be in your RAM
|
||||
/// and the less effective batching can be.
|
||||
/// Default to min(max_position_embeddings - 1, 16384)
|
||||
/// Default to min(max_position_embeddings, 16384)
|
||||
#[clap(long, env)]
|
||||
max_total_tokens: Option<usize>,
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user