mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-07-23 00:00:17 +00:00
* chore(neuron): bump version to 0.2.0 * refactor(neuron): use named parameters in inputs helpers This allows to hide the differences between the two backends in terms of input parameters. * refactor(neuron): remove obsolete code paths * fix(neuron): use neuron_config whenever possible * fix(neuron): use new cache import path * fix(neuron): neuron config is not stored in config anymore * fix(nxd): adapt model retrieval to new APIs * fix(generator): emulate greedy in sampling parameters When on-device sampling is enabled, we need to emulate the greedy behaviour using top-k=1, top-p=1, temperature=1. * test(neuron): update models and expectations * feat(neuron): support on-device sampling * fix(neuron): adapt entrypoint * tests(neuron): remove obsolete models * fix(neuron): adjust test expectations for llama on nxd |
||
---|---|---|
.. | ||
server | ||
tests | ||
Cargo.toml | ||
Makefile | ||
README.md | ||
tgi_entry_point.py | ||
tgi-entrypoint.sh |
Text-generation-inference - Neuron backend for AWS Trainium and inferentia2
Description
This is the TGI backend for AWS Neuron Trainium and Inferentia family of chips.
This backend is composed of:
- the AWS Neuron SDK,
- the legacy v2 TGI launcher and router,
- a neuron specific inference server for text-generation.
Usage
Please refer to the official documentation.
Build your own image
The simplest way to build TGI with the neuron backend is to use the provided Makefile
:
$ make -C backends/neuron image
Alternatively, you can build the image directly from the top directory using a command similar to the one defined
in the Makefile
under the image
target.