mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-06-17 14:52:09 +00:00
When on-device sampling is enabled, we need to emulate the greedy behaviour using top-k=1, top-p=1, temperature=1. |
||
---|---|---|
.. | ||
client | ||
gaudi | ||
grpc-metadata | ||
llamacpp | ||
neuron | ||
trtllm | ||
v2 | ||
v3 |