text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-06-14 13:22:07 +00:00

Author	SHA1	Message	Date
Adrien Gallouët	8a79cfd077	Bump llama.cpp Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-03-05 11:08:17 +00:00
Adrien Gallouët	8fe851209c	Support HF_HUB_USER_AGENT_ORIGIN Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-03-05 11:08:17 +00:00
Adrien Gallouët	aadd624933	Update Cargo.lock Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-03-05 11:08:17 +00:00
Adrien Gallouët	46feaf6296	Remove make-gguf.sh Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-03-05 11:08:17 +00:00
Adrien Gallouët	3849223340	Bump llama.cpp and switch to ggml-org Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-03-05 11:08:17 +00:00
Adrien Gallouët	0a55bd3db9	Quantize without llama-quantize Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-03-05 11:08:17 +00:00
Adrien Gallouët	6223b6e264	Fix build with Mach-O Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-03-05 11:08:17 +00:00
Adrien Gallouët	d41183a0b4	Save gguf in models/MODEL_ID/model.gguf Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-03-05 11:08:17 +00:00
Adrien Gallouët	961a133d4b	Update installed packages Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-03-05 11:08:17 +00:00
Adrien Gallouët	7388468e26	Update doc Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-03-05 11:08:17 +00:00
Adrien Gallouët	0d01a89f0f	Better error message Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-03-05 11:08:17 +00:00
Adrien Gallouët	2242d1a67c	Update doc Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-03-05 11:08:17 +00:00
Adrien Gallouët	30cd3cf510	Enable mmap, offload_kqv & flash_attention by default Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-03-05 11:08:17 +00:00
Adrien Gallouët	46bc8e6bc7	Bump llama.cpp Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-03-05 11:08:17 +00:00
Adrien Gallouët	2d4aa25b9c	Make --model-gguf optional Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-03-05 11:08:17 +00:00
Adrien Gallouët	bda39e42c2	Build faster Signed-off-by: Adrien Gallouët <angt@huggingface.co>	2025-03-05 11:08:17 +00:00
Nicolas Patry	ec35976f82	Only add token when it is defined. (#3073 ) * Only add token when it is defined. * Update router/src/server.rs	2025-03-05 11:59:52 +01:00
David Corvoysier	cb42b3ad83	fix(neuron): explicitly install toolchain (#3072 ) * fix(neuron): explicitly install toolchain * ci(neuron): trigger CI when Dockerfile is modified	2025-03-05 11:46:58 +01:00
Nicolas Patry	491ed9e11d	Patch rust release. (#3069 ) * Patch rust release. * Trying to remove the rust-toolchain hardcoded in action. * Upgrade rust toolchain. * Put back the toolchain ? * Fix neuron dockerfile. * Move to the proper version of Rust. * 1.85 since the GH action doesn't respect the override. * Typo. * Fixing the github action. * Fixing docker llamacpp. * Fixing the github action. * Update clippy.	2025-03-04 18:07:33 +01:00
Sadra Barikbin	144d99c147	Fix a tiny typo in `monitoring.md` tutorial (#3056 ) Update monitoring.md	2025-03-04 17:06:26 +01:00
Nicolas Patry	08bbfa16a1	Preparing for release. (#3060 ) * Preparing for release. * Upgrade doc. * Fix docs auto-generated. * Fix update doc along.	2025-03-04 16:47:10 +01:00
Hugo Larcher	d8ff7f2623	feat: add support for HF_HUB_USER_AGENT_ORIGIN to add user-agent Origin field in Hub requests. (#3061 ) * feat: add support for HF_HUB_USER_AGENT_ORIGIN to add user-agent Origin field in Hub requests. * fix: Rust version for Neuron * fix: PR comments, use rust-toolchain.toml	2025-03-04 16:43:50 +01:00
Daniël de Kok	e88f6f6ee9	Add property-based testing for `RadixAllocator` (#3068 )	2025-03-04 15:09:46 +01:00
Daniël de Kok	fa4e9511f8	Fix two edge cases in `RadixTrie::find` (#3067 ) - Always return a node, not its parent. - Do not recurse when a node does not represent a full prefix of the input.	2025-03-04 13:23:27 +01:00
Nicolas Patry	a914a21899	Revert "Patch rust release." This reverts commit `aad9c2b0bd`.	2025-03-04 12:16:18 +00:00
Nicolas Patry	aad9c2b0bd	Patch rust release.	2025-03-04 12:14:58 +00:00
Nicolas Patry	1f35cc7a31	Updating patch rust release.	2025-03-04 12:13:58 +00:00
Baptiste Colle	683ff53fa3	Add Gaudi Backend (#3055 ) * wip(gaudi): import server and dockerfile from tgi-gaudi fork * feat(gaudi): new gaudi backend working * fix: fix style * fix prehooks issues * fix(gaudi): refactor server and implement requested changes	2025-02-28 12:14:58 +01:00
David Corvoysier	5eec3a8bb6	Avoid running neuron integration tests twice (#3054 ) * test(neuron): refactor to prepare batch export * test(neuron): add helper to batch export models Also rename fixture file fro clarity. * ci(neuron): do not run tests twice * ci(neuron): rename precompilation job * test(neuron): remove redundant subdirectory * test(neuron): remove erroneous line * doc(neuron): update links to installation page * feat(neuron): cleanup Dockerfile CARGO_REGISTRIES_CRATES_IO_PROTOCOL=sparse is not required anymore. * test(neuron): try to reduce download errors	2025-02-26 12:15:01 +01:00
drbh	b0069e0485	fix: run linters and fix formatting (#3057 )	2025-02-25 16:11:34 -05:00
Wang, Yi	d7a24c03cf	some minor fix (#3048 ) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2025-02-25 12:07:55 +01:00
Nicolas Patry	cea9dbc971	You need to seek apparently. (#3049 )	2025-02-24 14:58:23 +01:00
David Corvoysier	c00add9c03	Add Neuron backend (#3033 ) * feat: add neuron backend * feat(neuron): add server standalone installation * feat(neuron): add server and integration tests * fix(neuron): increase ulimit when building image The base image used to compile the rust components seems to have a low ulimit for opened files, which leads to errors during compilation. * test(neuron): merge integration tests and fixtures * test: add --neuron option * review: do not use latest tag * review: remove ureq pinned version * review: --privileged should be the exception * feat: add neuron case to build ci * fix(neuron): export models from container in test fixtures The neuron tests require models to have been previously exported and cached on the hub. This is done automatically by the neuron.model fixture the first time the tests are ran for a specific version. This fixture used to export the models using optimum-neuron directly, but this package is not necessarily present on the system. Instead, it is now done through the neuron TGI itself, since it contains all the tools required to export the models. Note that since the CI runs docker in docker (dind) it does not seem possible to share a volume between the CI container and the container used to export the model. For that reason, a specific image with a modified entrypoint is built on-the-fly when a model export is required. * refactor: remove sagemaker entry-point The SageMaker image is built differently anyway. * fix(neuron): avoid using Levenshtein * test(neuron): use smaller llama model * feat(neuron): avoid installing CUDA in image * test(neuron): no error anymore when requesting too many tokens * ci: doing a precompilation step (with a different token). * test(neuron): avoid using image sha when exporting models We now manually evaluate the apparent hash of the neuron backend by combining the hash of the neuron backend directory and Dockerfile. This new hash is used to identify exported neuron models instead of the image sha. This has two benefits: - it changes less frequently (only hwen the neuron backend changes), which means less neuron models being pushed to the hub, - it can be evaluated locally, meaning that running the tests once locally will export the models before the CI uses them. * test(neuron): added a small script to prune test models --------- Co-authored-by: drbh <david.richard.holtz@gmail.com> Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2025-02-24 09:10:05 +01:00
Daniël de Kok	97c5f7e685	Use `rotary` kernel from the Hub (#3041 )	2025-02-21 13:55:31 +01:00
drbh	1cae3197c4	Improve tool call message processing (#3036 ) * make content field optional in chat request * add tool_calls field to Message struct * feat: add test and serialize tool messages * fix: bump utopia, openapi doc version and improve test * fix: rerun update docs * fix: suppoer tool call id in template and remove unnecessary changes * fix: ruff lint remove unused import * fix: adjust message types in tests --------- Co-authored-by: sailesh duddupudi <saileshradar@gmail.com>	2025-02-21 10:30:29 +01:00
Adrien Gallouët	3498f6085e	Update Gradio ChatInterface configuration in consuming_tgi.md (#3042 ) The current code does not work and gives the following message: UserWarning: You have not specified a value for the `type` parameter. Defaulting to the 'tuples' format for chatbot messages, but this is deprecated and will be removed in a future version of Gradio. Please set type='messages' instead, which uses openai-style dictionaries with 'role' and 'content' keys. warnings.warn( Traceback (most recent call last): File "/Users/angt/hf/tgi/test-gradio.py", line 22, in <module> gr.ChatInterface( TypeError: ChatInterface.__init__() got an unexpected keyword argument 'retry_btn' Signed-off-by: Adrien Gallouët <adrien@gallouet.fr>	2025-02-21 10:11:28 +01:00
Nicolas Patry	142a49a80d	Simplify logs2. (#3045 ) * Simplify logs2. * Changing the scope from module to session to fix the event_loop issue.	2025-02-21 10:03:40 +01:00
Wang, Yi	06dfe9abfe	fix qwen2 vl crash in continous batching (#3004 ) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2025-02-20 18:36:45 -05:00
Daniël de Kok	ed96ba6503	flashinfer 0.2.0.post1 -> post2 (#3040 ) * flashinfer 0.2.0.post1 -> post2 * Fix ruff stuff. --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2025-02-20 12:34:20 +01:00
Wang, Yi	feaa2477b7	update ipex and torch to 2.6 for cpu (#3039 ) ipex cpu 2.6 support topk_group in moe fusion ops Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2025-02-20 09:12:28 +01:00
Hugo Larcher	230aa25641	feat: Add the parsing of HF_HUB_USER_AGENT_ORIGIN environment variable for telemetry (#3027 ) * feat: Add the parsing of HF_HUB_USER_AGENT_ORIGIN environment variable to add info about the environment running TGI. That is useful to track usage in case of collaborations for example. * fix: trufflehog	2025-02-19 21:09:12 +01:00
Nicolas Patry	9c89d0070e	Having less logs in case of failure for checking CI more easily. (#3037 ) * Having less logs in case of failure for checking CI more easily. * Cleaning up the versions to uv for the client. * Ignore entirely the API.	2025-02-19 17:01:33 +01:00
Nicolas Patry	fde3234cbc	Using public external registry (to use external runners for CI). (#3031 ) * Using public external registry (to use external runners for CI). * Fix build. * Fixing the external registry. * Fixing trtllm tests.	2025-02-19 14:53:14 +01:00
drbh	d6a0c67e2f	feat: add initial qwen2.5-vl model and test (#2971 ) * feat: support qwen2.5 vl model * fix: bump support models doc * feat: check before rope type adjustment and small refactors * fix: add transformer overlay for processor support * fix: vendor processor and config from transformers * fix: refactor/simplify conditionals	2025-02-19 12:38:20 +01:00
Cyril Vallez	a7448661f7	Improve Transformers support (#2970 ) * Much better support * add gpt neox * bump transformers version * bump version	2025-02-18 19:04:34 +01:00
Nicolas Patry	5543fdc765	It's find in some machine. using hf_hub::api::sync::Api to download c… (#3030 ) It's find in some machine. using hf_hub::api::sync::Api to download config is not successful which will make warmup fail since attribute like max_position_embeddings could not be got. update hf-hub to the latest version could fix it Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> Co-authored-by: Wang, Yi A <yi.a.wang@intel.com>	2025-02-18 12:19:51 +01:00
Nicolas Patry	b8a4928d0e	Pinning trufflehog. (#3032 )	2025-02-18 12:03:41 +01:00
Alvaro Bartolome	8a1cfd6122	Add `loop_controls` feature to `minijinja` to handle `{% break %}` (#2998 ) * Add `loop_controls` feature to `minijinja` * Add `test_chat_template_loop_controls` to test `break`	2025-02-18 10:33:22 +01:00
celsowm	794ec58b75	Update README.md (#3024 ) only way to avoid: error: experimental Nix feature 'nix-command' is disabled; add '--extra-experimental-features nix-command' to enable it	2025-02-18 10:08:28 +01:00
Daniël de Kok	f0ed76583c	Use eetq kernel from the hub (#3029 ) * Use eetq kernel from the hub * Fixing the CI. --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2025-02-18 10:03:53 +01:00

1 2 3 4 5 ...

1319 Commits