Commit Graph

1319 Commits

Author SHA1 Message Date
Adrien Gallouët
8a79cfd077
Bump llama.cpp
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-03-05 11:08:17 +00:00
Adrien Gallouët
8fe851209c
Support HF_HUB_USER_AGENT_ORIGIN
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-03-05 11:08:17 +00:00
Adrien Gallouët
aadd624933
Update Cargo.lock
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-03-05 11:08:17 +00:00
Adrien Gallouët
46feaf6296
Remove make-gguf.sh
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-03-05 11:08:17 +00:00
Adrien Gallouët
3849223340
Bump llama.cpp and switch to ggml-org
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-03-05 11:08:17 +00:00
Adrien Gallouët
0a55bd3db9
Quantize without llama-quantize
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-03-05 11:08:17 +00:00
Adrien Gallouët
6223b6e264
Fix build with Mach-O
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-03-05 11:08:17 +00:00
Adrien Gallouët
d41183a0b4
Save gguf in models/MODEL_ID/model.gguf
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-03-05 11:08:17 +00:00
Adrien Gallouët
961a133d4b
Update installed packages
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-03-05 11:08:17 +00:00
Adrien Gallouët
7388468e26
Update doc
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-03-05 11:08:17 +00:00
Adrien Gallouët
0d01a89f0f
Better error message
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-03-05 11:08:17 +00:00
Adrien Gallouët
2242d1a67c
Update doc
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-03-05 11:08:17 +00:00
Adrien Gallouët
30cd3cf510
Enable mmap, offload_kqv & flash_attention by default
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-03-05 11:08:17 +00:00
Adrien Gallouët
46bc8e6bc7
Bump llama.cpp
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-03-05 11:08:17 +00:00
Adrien Gallouët
2d4aa25b9c
Make --model-gguf optional
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-03-05 11:08:17 +00:00
Adrien Gallouët
bda39e42c2
Build faster
Signed-off-by: Adrien Gallouët <angt@huggingface.co>
2025-03-05 11:08:17 +00:00
Nicolas Patry
ec35976f82
Only add token when it is defined. (#3073)
* Only add token when it is defined.

* Update router/src/server.rs
2025-03-05 11:59:52 +01:00
David Corvoysier
cb42b3ad83
fix(neuron): explicitly install toolchain (#3072)
* fix(neuron): explicitly install toolchain

* ci(neuron): trigger CI when Dockerfile is modified
2025-03-05 11:46:58 +01:00
Nicolas Patry
491ed9e11d
Patch rust release. (#3069)
* Patch rust release.

* Trying to remove the rust-toolchain hardcoded in action.

* Upgrade rust toolchain.

* Put back the toolchain ?

* Fix neuron dockerfile.

* Move to the proper version of Rust.

* 1.85 since the GH action doesn't respect the override.

* Typo.

* Fixing the github action.

* Fixing docker llamacpp.

* Fixing the github action.

* Update clippy.
2025-03-04 18:07:33 +01:00
Sadra Barikbin
144d99c147
Fix a tiny typo in monitoring.md tutorial (#3056)
Update monitoring.md
2025-03-04 17:06:26 +01:00
Nicolas Patry
08bbfa16a1
Preparing for release. (#3060)
* Preparing for release.

* Upgrade doc.

* Fix docs auto-generated.

* Fix update doc along.
2025-03-04 16:47:10 +01:00
Hugo Larcher
d8ff7f2623
feat: add support for HF_HUB_USER_AGENT_ORIGIN to add user-agent Origin field in Hub requests. (#3061)
* feat: add support for HF_HUB_USER_AGENT_ORIGIN to add user-agent Origin field in Hub requests.

* fix: Rust version for Neuron

* fix: PR comments, use rust-toolchain.toml
2025-03-04 16:43:50 +01:00
Daniël de Kok
e88f6f6ee9
Add property-based testing for RadixAllocator (#3068) 2025-03-04 15:09:46 +01:00
Daniël de Kok
fa4e9511f8
Fix two edge cases in RadixTrie::find (#3067)
- Always return a node, not its parent.
- Do not recurse when a node does not represent a full prefix of the
  input.
2025-03-04 13:23:27 +01:00
Nicolas Patry
a914a21899
Revert "Patch rust release."
This reverts commit aad9c2b0bd.
2025-03-04 12:16:18 +00:00
Nicolas Patry
aad9c2b0bd
Patch rust release. 2025-03-04 12:14:58 +00:00
Nicolas Patry
1f35cc7a31
Updating patch rust release. 2025-03-04 12:13:58 +00:00
Baptiste Colle
683ff53fa3
Add Gaudi Backend (#3055)
* wip(gaudi): import server and dockerfile from tgi-gaudi fork

* feat(gaudi): new gaudi backend working

* fix: fix style

* fix prehooks issues

* fix(gaudi): refactor server and implement requested changes
2025-02-28 12:14:58 +01:00
David Corvoysier
5eec3a8bb6
Avoid running neuron integration tests twice (#3054)
* test(neuron): refactor to prepare batch export

* test(neuron): add helper to batch export models

Also rename fixture file fro clarity.

* ci(neuron): do not run tests twice

* ci(neuron): rename precompilation job

* test(neuron): remove redundant subdirectory

* test(neuron): remove erroneous line

* doc(neuron): update links to installation page

* feat(neuron): cleanup Dockerfile

CARGO_REGISTRIES_CRATES_IO_PROTOCOL=sparse is not required anymore.

* test(neuron): try to reduce download errors
2025-02-26 12:15:01 +01:00
drbh
b0069e0485
fix: run linters and fix formatting (#3057) 2025-02-25 16:11:34 -05:00
Wang, Yi
d7a24c03cf
some minor fix (#3048)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-02-25 12:07:55 +01:00
Nicolas Patry
cea9dbc971
You need to seek apparently. (#3049) 2025-02-24 14:58:23 +01:00
David Corvoysier
c00add9c03
Add Neuron backend (#3033)
* feat: add neuron backend

* feat(neuron): add server standalone installation

* feat(neuron): add server and integration tests

* fix(neuron): increase ulimit when building image

The base image used to compile the rust components seems to have a low
ulimit for opened files, which leads to errors during compilation.

* test(neuron): merge integration tests and fixtures

* test: add --neuron option

* review: do not use latest tag

* review: remove ureq pinned version

* review: --privileged should be the exception

* feat: add neuron case to build ci

* fix(neuron): export models from container in test fixtures

The neuron tests require models to have been previously exported and
cached on the hub. This is done automatically by the neuron.model
fixture the first time the tests are ran for a specific version.
This fixture used to export the models using optimum-neuron directly,
but this package is not necessarily present on the system.
Instead, it is now done through the neuron TGI itself, since it
contains all the tools required to export the models.
Note that since the CI runs docker in docker (dind) it does not seem
possible to share a volume between the CI container and the container
used to export the model.
For that reason, a specific image with a modified entrypoint is built
on-the-fly when a model export is required.

* refactor: remove sagemaker entry-point

The SageMaker image is built differently anyway.

* fix(neuron): avoid using Levenshtein

* test(neuron): use smaller llama model

* feat(neuron): avoid installing CUDA in image

* test(neuron): no error anymore when requesting too many tokens

* ci: doing a precompilation step (with a different token).

* test(neuron): avoid using image sha when exporting models

We now manually evaluate the apparent hash of the neuron backend by
combining the hash of the neuron backend directory and Dockerfile.
This new hash is used to identify exported neuron models instead of the
image sha.
This has two benefits:
- it changes less frequently (only hwen the neuron backend changes),
  which means less neuron models being pushed to the hub,
- it can be evaluated locally, meaning that running the tests once
  locally will export the models before the CI uses them.

* test(neuron): added a small script to prune test models

---------

Co-authored-by: drbh <david.richard.holtz@gmail.com>
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2025-02-24 09:10:05 +01:00
Daniël de Kok
97c5f7e685
Use rotary kernel from the Hub (#3041) 2025-02-21 13:55:31 +01:00
drbh
1cae3197c4
Improve tool call message processing (#3036)
* make content field optional in chat request

* add tool_calls field to Message struct

* feat: add test and serialize tool messages

* fix: bump utopia, openapi doc version and improve test

* fix: rerun update docs

* fix: suppoer tool call id in template and remove unnecessary changes

* fix: ruff lint remove unused import

* fix: adjust message types in tests

---------

Co-authored-by: sailesh duddupudi <saileshradar@gmail.com>
2025-02-21 10:30:29 +01:00
Adrien Gallouët
3498f6085e
Update Gradio ChatInterface configuration in consuming_tgi.md (#3042)
The current code does not work and gives the following message:

    UserWarning: You have not specified a value for the `type` parameter. Defaulting to the 'tuples' format for chatbot messages, but this is deprecated and will be removed in a future version of Gradio. Please set type='messages' instead, which uses openai-style dictionaries with 'role' and 'content' keys.
      warnings.warn(
    Traceback (most recent call last):
      File "/Users/angt/hf/tgi/test-gradio.py", line 22, in <module>
        gr.ChatInterface(
    TypeError: ChatInterface.__init__() got an unexpected keyword argument 'retry_btn'

Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>
2025-02-21 10:11:28 +01:00
Nicolas Patry
142a49a80d
Simplify logs2. (#3045)
* Simplify logs2.

* Changing the scope from module to session to fix the event_loop issue.
2025-02-21 10:03:40 +01:00
Wang, Yi
06dfe9abfe
fix qwen2 vl crash in continous batching (#3004)
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-02-20 18:36:45 -05:00
Daniël de Kok
ed96ba6503
flashinfer 0.2.0.post1 -> post2 (#3040)
* flashinfer 0.2.0.post1 -> post2

* Fix ruff stuff.

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2025-02-20 12:34:20 +01:00
Wang, Yi
feaa2477b7
update ipex and torch to 2.6 for cpu (#3039)
ipex cpu 2.6 support topk_group in moe fusion ops

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
2025-02-20 09:12:28 +01:00
Hugo Larcher
230aa25641
feat: Add the parsing of HF_HUB_USER_AGENT_ORIGIN environment variable for telemetry (#3027)
* feat: Add the parsing of HF_HUB_USER_AGENT_ORIGIN environment variable to add info about the environment running TGI. That is useful to track usage in case of collaborations for example.

* fix: trufflehog
2025-02-19 21:09:12 +01:00
Nicolas Patry
9c89d0070e
Having less logs in case of failure for checking CI more easily. (#3037)
* Having less logs in case of failure for checking CI more easily.

* Cleaning up the versions to uv for the client.

* Ignore entirely the API.
2025-02-19 17:01:33 +01:00
Nicolas Patry
fde3234cbc
Using public external registry (to use external runners for CI). (#3031)
* Using public external registry (to use external runners for CI).

* Fix build.

* Fixing the external registry.

* Fixing trtllm tests.
2025-02-19 14:53:14 +01:00
drbh
d6a0c67e2f
feat: add initial qwen2.5-vl model and test (#2971)
* feat: support qwen2.5 vl model

* fix: bump support models doc

* feat: check before rope type adjustment and small refactors

* fix: add transformer overlay for processor support

* fix: vendor processor and config from transformers

* fix: refactor/simplify conditionals
2025-02-19 12:38:20 +01:00
Cyril Vallez
a7448661f7
Improve Transformers support (#2970)
* Much better support

* add gpt neox

* bump transformers version

* bump version
2025-02-18 19:04:34 +01:00
Nicolas Patry
5543fdc765
It's find in some machine. using hf_hub::api::sync::Api to download c… (#3030)
It's find in some machine. using hf_hub::api::sync::Api to download config is not successful which will make warmup fail since attribute like max_position_embeddings could not be got. update hf-hub to the latest version could fix it

Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Co-authored-by: Wang, Yi A <yi.a.wang@intel.com>
2025-02-18 12:19:51 +01:00
Nicolas Patry
b8a4928d0e
Pinning trufflehog. (#3032) 2025-02-18 12:03:41 +01:00
Alvaro Bartolome
8a1cfd6122
Add loop_controls feature to minijinja to handle {% break %} (#2998)
* Add `loop_controls` feature to `minijinja`

* Add `test_chat_template_loop_controls` to test `break`
2025-02-18 10:33:22 +01:00
celsowm
794ec58b75
Update README.md (#3024)
only way to avoid:
error: experimental Nix feature 'nix-command' is disabled; add '--extra-experimental-features nix-command' to enable it
2025-02-18 10:08:28 +01:00
Daniël de Kok
f0ed76583c
Use eetq kernel from the hub (#3029)
* Use eetq kernel from the hub

* Fixing the CI.

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
2025-02-18 10:03:53 +01:00