text-generation-inference/backends/neuron/server/text_generation_server
David Corvoysier 787e28bf59 fix(generator): emulate greedy in sampling parameters
When on-device sampling is enabled, we need to emulate the greedy
behaviour using top-k=1, top-p=1, temperature=1.
2025-06-06 15:31:05 +00:00
..
cli.py fix: run linters and fix formatting (#3057) 2025-02-25 16:11:34 -05:00
generator.py fix(generator): emulate greedy in sampling parameters 2025-06-06 15:31:05 +00:00
interceptor.py fix: run linters and fix formatting (#3057) 2025-02-25 16:11:34 -05:00
model.py fix(nxd): adapt model retrieval to new APIs 2025-06-06 15:31:04 +00:00
server.py Add Neuron backend (#3033) 2025-02-24 09:10:05 +01:00