* feat(neuron): use AWS Neuron SDK 2.21.1
* feat(neuron): bump optimum-neuron version
* feat(neuron): tag latest image for local tests
* test(neuron): simplify sampling test
* Fixing the tool calling convention.
* Update tehe doc.
* Fixing some corner cases.
* Fixing the tool call id.
* Fmt.
* Snapshot update with the new updated tool_call_id.
* More qwen2.
* change ChatCompletionChunk to align with "OpenAI Chat Completions streaming API"
Moving after tool_calls2
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
add in Buffering..
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
fix: handle usage outside of stream state and add tests
Simplifying everything quite a bit.
Remove the unused model_dump.
Clippy.
Clippy ?
Ruff.
Uppgrade the flake for latest transformers.
Upgrade after rebase.
Remove potential footgun.
Fix completion test.
* Clippy.
* Tweak for multi prompt.
* Ruff.
* Update the snapshot a bit.
---------
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* Making `tool_calls` a vector.
* Arguments output is a string.
* Update all the integration tests.
* Add the requirements.
* Upgrade other tests.
* Clippy.
* Update the old test.
* Making `tool_calls` a vector.
* Update doc.
* Fixing the nix overlay with updated version.
* Add openai dependency.
* Updating the old tests.
* Trying to reduce the logs in the case of errors.
* Less spammy logs too.
* Patch rust release.
* Trying to remove the rust-toolchain hardcoded in action.
* Upgrade rust toolchain.
* Put back the toolchain ?
* Fix neuron dockerfile.
* Move to the proper version of Rust.
* 1.85 since the GH action doesn't respect the override.
* Typo.
* Fixing the github action.
* Fixing docker llamacpp.
* Fixing the github action.
* Update clippy.
* feat: add support for HF_HUB_USER_AGENT_ORIGIN to add user-agent Origin field in Hub requests.
* fix: Rust version for Neuron
* fix: PR comments, use rust-toolchain.toml
* wip(gaudi): import server and dockerfile from tgi-gaudi fork
* feat(gaudi): new gaudi backend working
* fix: fix style
* fix prehooks issues
* fix(gaudi): refactor server and implement requested changes
* feat: add neuron backend
* feat(neuron): add server standalone installation
* feat(neuron): add server and integration tests
* fix(neuron): increase ulimit when building image
The base image used to compile the rust components seems to have a low
ulimit for opened files, which leads to errors during compilation.
* test(neuron): merge integration tests and fixtures
* test: add --neuron option
* review: do not use latest tag
* review: remove ureq pinned version
* review: --privileged should be the exception
* feat: add neuron case to build ci
* fix(neuron): export models from container in test fixtures
The neuron tests require models to have been previously exported and
cached on the hub. This is done automatically by the neuron.model
fixture the first time the tests are ran for a specific version.
This fixture used to export the models using optimum-neuron directly,
but this package is not necessarily present on the system.
Instead, it is now done through the neuron TGI itself, since it
contains all the tools required to export the models.
Note that since the CI runs docker in docker (dind) it does not seem
possible to share a volume between the CI container and the container
used to export the model.
For that reason, a specific image with a modified entrypoint is built
on-the-fly when a model export is required.
* refactor: remove sagemaker entry-point
The SageMaker image is built differently anyway.
* fix(neuron): avoid using Levenshtein
* test(neuron): use smaller llama model
* feat(neuron): avoid installing CUDA in image
* test(neuron): no error anymore when requesting too many tokens
* ci: doing a precompilation step (with a different token).
* test(neuron): avoid using image sha when exporting models
We now manually evaluate the apparent hash of the neuron backend by
combining the hash of the neuron backend directory and Dockerfile.
This new hash is used to identify exported neuron models instead of the
image sha.
This has two benefits:
- it changes less frequently (only hwen the neuron backend changes),
which means less neuron models being pushed to the hub,
- it can be evaluated locally, meaning that running the tests once
locally will export the models before the CI uses them.
* test(neuron): added a small script to prune test models
---------
Co-authored-by: drbh <david.richard.holtz@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
* make content field optional in chat request
* add tool_calls field to Message struct
* feat: add test and serialize tool messages
* fix: bump utopia, openapi doc version and improve test
* fix: rerun update docs
* fix: suppoer tool call id in template and remove unnecessary changes
* fix: ruff lint remove unused import
* fix: adjust message types in tests
---------
Co-authored-by: sailesh duddupudi <saileshradar@gmail.com>
The current code does not work and gives the following message:
UserWarning: You have not specified a value for the `type` parameter. Defaulting to the 'tuples' format for chatbot messages, but this is deprecated and will be removed in a future version of Gradio. Please set type='messages' instead, which uses openai-style dictionaries with 'role' and 'content' keys.
warnings.warn(
Traceback (most recent call last):
File "/Users/angt/hf/tgi/test-gradio.py", line 22, in <module>
gr.ChatInterface(
TypeError: ChatInterface.__init__() got an unexpected keyword argument 'retry_btn'
Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>
* feat: Add the parsing of HF_HUB_USER_AGENT_ORIGIN environment variable to add info about the environment running TGI. That is useful to track usage in case of collaborations for example.
* fix: trufflehog