diff --git a/docs/source/basic_tutorials/launcher.md b/docs/source/basic_tutorials/launcher.md index 95f5705f..f9b76ed1 100644 --- a/docs/source/basic_tutorials/launcher.md +++ b/docs/source/basic_tutorials/launcher.md @@ -149,7 +149,7 @@ Options: ## MAX_TOTAL_TOKENS ```shell --max-total-tokens - This is the most important value to set as it defines the "memory budget" of running clients requests. Clients will send input sequences and ask to generate `max_new_tokens` on top. with a value of `1512` users can send either a prompt of `1000` and ask for `512` new tokens, or send a prompt of `1` and ask for `1511` max_new_tokens. The larger this value, the larger amount each request will be in your RAM and the less effective batching can be. Default to min(max_position_embeddings - 1, 16384) + This is the most important value to set as it defines the "memory budget" of running clients requests. Clients will send input sequences and ask to generate `max_new_tokens` on top. with a value of `1512` users can send either a prompt of `1000` and ask for `512` new tokens, or send a prompt of `1` and ask for `1511` max_new_tokens. The larger this value, the larger amount each request will be in your RAM and the less effective batching can be. Default to min(max_position_embeddings, 16384) [env: MAX_TOTAL_TOKENS=] diff --git a/launcher/src/main.rs b/launcher/src/main.rs index 2ba34732..580f7476 100644 --- a/launcher/src/main.rs +++ b/launcher/src/main.rs @@ -230,7 +230,7 @@ struct Args { /// `1511` max_new_tokens. /// The larger this value, the larger amount each request will be in your RAM /// and the less effective batching can be. - /// Default to min(max_position_embeddings - 1, 16384) + /// Default to min(max_position_embeddings, 16384) #[clap(long, env)] max_total_tokens: Option,