mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-24 16:32:12 +00:00
2.0 KiB
2.0 KiB
Request Parameters for Text Generation Inference
Text Generation Inference support different parameters to control the generation, defining them in the `parameters`` attribute of the payload.
curl https://j4xhm53fxl9ussm8.us-east-1.aws.endpoints.huggingface.cloud \
-X POST \
-d '{"inputs":"Once upon a time,", "parameters": {"max_new_tokens": 256}}' \
-H "Authorization: Bearer <hf_token>" \
-H "Content-Type: application/json"
As of today, the following parameters are supported:
temperature
: Controls randomness in the model. Lower values will make the model more deterministic and higher values will make the model more random. Default value is 1.0.max_new_tokens
: The maximum number of tokens to generate. Default value is 20, max value is 512.repetition_penalty
: Controls the likelihood of repetition. Default isnull
.seed
: The seed to use for random generation. Default isnull
.stop
: A list of tokens to stop the generation. The generation will stop when one of the tokens is generated.top_k
: The number of highest probability vocabulary tokens to keep for top-k-filtering. Default value isnull
, which disables top-k-filtering.top_p
: The cumulative probability of parameter highest probability vocabulary tokens to keep for nucleus sampling, default tonull
do_sample
: Whether or not to use sampling; use greedy decoding otherwise. Default value isfalse
.best_of
: Generate best_of sequences and return the one if the highest token logprobs, default tonull
.details
: Whether or not to return details about the generation. Default value isfalse
.return_full_text
: Whether or not to return the full text or only the generated part. Default value isfalse
.truncate
: Whether or not to truncate the input to the maximum length of the model. Default value istrue
.typical_p
: The typical probability of a token. Default value isnull
.watermark
: The watermark to use for the generation. Default value isfalse
.