mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-24 16:32:12 +00:00
Update docstring in launcher/src/main.rs
instead
This commit is contained in:
parent
7a40844734
commit
a1b3887846
@ -702,8 +702,8 @@ struct Args {
|
||||
/// Overall this number should be the largest possible amount that fits the
|
||||
/// remaining memory (after the model is loaded). Since the actual memory overhead
|
||||
/// depends on other parameters like if you're using quantization, flash attention
|
||||
/// or the model implementation, text-generation-inference cannot infer this number
|
||||
/// automatically.
|
||||
/// or the model implementation, text-generation-inference infers this number automatically
|
||||
/// if not provided ensuring that the value is as large as possible.
|
||||
#[clap(long, env)]
|
||||
max_batch_total_tokens: Option<u32>,
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user