Commit Graph

  • 3f29043d7e
    Patch rust release. Nicolas Patry 2025-03-04 12:14:58 +0000
  • a914a21899
    Revert "Patch rust release." Nicolas Patry 2025-03-04 12:16:18 +0000
  • aad9c2b0bd
    Patch rust release. Nicolas Patry 2025-03-04 12:14:58 +0000
  • 1f35cc7a31
    Updating patch rust release. Nicolas Patry 2025-03-04 12:13:58 +0000
  • a18c13f75d Fix two edge cases in RadixTrie::find Daniël de Kok 2025-03-03 12:59:26 +0000
  • 7d289b2663
    fix: PR comments, use rust-toolchain.toml Hugo Larcher 2025-03-03 10:30:55 +0100
  • 683ff53fa3
    Add Gaudi Backend (#3055) Baptiste Colle 2025-02-28 12:14:58 +0100
  • e66bbfff2e use DefaultWeightsLoader in skip modules jiqing-feng 2025-02-28 10:35:13 +0000
  • c35810d6f0
    Fix the loading issue of 90B (#283) Yuan Wu 2025-02-28 18:20:55 +0800
  • f72547c9fb feat(metrics): remove ngrok mandatory feature for backendv3 crate proxy_sse_engine_state Morgan Funtowicz 2025-02-27 22:56:04 +0100
  • 712199c769 feat(metrics): dispatch internal engine state event from queuing/batching tasks Morgan Funtowicz 2025-02-27 22:43:20 +0100
  • 1a9c5dec76 feat(metrics): update Cargo.lock Morgan Funtowicz 2025-02-27 21:33:41 +0100
  • efb20054aa feat: consolidate streaming and event creation logic and add tests for streaming generations pr-2954-ci-branch drbh 2025-02-27 16:12:51 +0000
  • 8de41f63a8 feat(metrics): exposes the engine state as an endpoint Morgan Funtowicz 2025-02-27 16:58:02 +0100
  • 1d3a4ab851
    Enable mllama (#272) Yuan Wu 2025-02-27 23:12:15 +0800
  • bb8f59632f feat(metrics): exposes queue size as tokens along with individual requests count Morgan Funtowicz 2025-02-27 14:32:51 +0100
  • 7cdbd694b3 fix(gaudi): refactor server and implement requested changes baptiste 2025-02-27 12:59:27 +0000
  • ca8763bc54
    fix: Rust version for Neuron Hugo Larcher 2025-02-27 10:11:44 +0100
  • 380e73dba9 Added model name label to metrics and added an optional argument --served-model-name “yashaswipiplani” 2025-02-27 08:35:33 +0000
  • 1cb904e619
    Update model.py Jim Burtoft 2025-02-26 18:47:33 -0500
  • 1ebee925ff
    feat: add support for HF_HUB_USER_AGENT_ORIGIN to add user-agent Origin field in Hub requests. Hugo Larcher 2025-02-26 22:11:08 +0100
  • 330f2e419f feat: improve partial parsing types and add test for balancing and partial parsing drbh 2025-02-26 18:53:12 +0000
  • a5ddc9db52 feat: refactor and simplify chat stream more, bump tests and support stream_options drbh 2025-02-25 20:55:56 +0000
  • c4cb54c23e fix: bump integrations requirements drbh 2025-02-24 21:56:26 +0000
  • 31a536d796 feat: refactor chat stream to remove state machine and simplfy logic drbh 2025-02-24 21:51:33 +0000
  • a416ddbdd9 fix: adjust integration tests for openai client dep drbh 2025-02-19 11:46:10 -0500
  • e1b6d5be4a fix: clippy cleanup drbh 2025-02-18 21:31:20 +0000
  • 538456ba68 fix: only send function name on first stream event drbh 2025-02-18 21:13:03 +0000
  • 68aa6b1af0 fix: bump requirements file too drbh 2025-02-17 14:54:49 +0000
  • fd611f30c9 fix: bump integration test deps for openai drbh 2025-02-17 14:40:31 +0000
  • 7d17d7cef7 fix: bump client tests for api changes drbh 2025-02-17 14:19:52 +0000
  • c215c0de88 fix: bump client test expected prefill drbh 2025-02-17 13:56:59 +0000
  • 1529a676d9 fix: remove snap with incorrect naming drbh 2025-02-17 13:41:06 +0000
  • 40f905d00b fix: adjust stream, improve tests and add openai client test drbh 2025-02-17 13:38:49 +0000
  • 07c20903e5 fix: ensure wrapping curly is not included drbh 2025-02-11 15:05:14 +0000
  • dbce04e4d3 fix: adjust streaming tool response drbh 2025-02-11 14:22:03 +0000
  • 5f030140be fix: bump openapi spec drbh 2025-02-10 15:14:00 +0000
  • 0ca7af8830 feat: serialize function definition with serialize_as_string drbh 2025-02-07 22:27:24 +0000
  • 983b9675d6 fix: Functioncall is actually a bit different than the deprecated function definition type Nicolas Casademont 2025-02-04 11:09:55 +0100
  • c1c4dfb521 fix: Allow back arguments in function definition and the corresponding test Nicolas Casademont 2025-02-04 11:07:42 +0100
  • 8542e2b746 feat: Make streaming for tool calling behave the same as the open ai api Nicolas Casademont 2025-01-24 14:42:25 +0100
  • 9a9a763eee fix: Adapt function call response to return a json string for arguments Nicolas Casademont 2025-01-24 11:47:01 +0100
  • d88ae2c5b2
    Fix update doc along. Nicolas Patry 2025-02-26 14:45:58 +0100
  • 733eb5fa90
    Fix docs auto-generated. Nicolas Patry 2025-02-26 14:43:48 +0100
  • bb0bbbd485
    Upgrade doc. Nicolas Patry 2025-02-26 14:31:52 +0100
  • eb8e7f6ff8
    Preparing for release. Nicolas Patry 2025-02-26 14:19:53 +0100
  • b7bdbbd8c0 revert unquantized changes jiqing-feng 2025-02-26 12:23:33 +0000
  • 5eec3a8bb6
    Avoid running neuron integration tests twice (#3054) David Corvoysier 2025-02-26 12:15:01 +0100
  • b370902626 test(neuron): try to reduce download errors David Corvoysier 2025-02-25 12:54:22 +0000
  • be06297e62 feat(neuron): cleanup Dockerfile David Corvoysier 2025-02-25 09:10:25 +0000
  • d59b4fdce9 doc(neuron): update links to installation page David Corvoysier 2025-02-25 10:48:17 +0000
  • e783f88dc5 test(neuron): remove erroneous line David Corvoysier 2025-02-25 09:21:25 +0000
  • f6859c4179 test(neuron): remove redundant subdirectory David Corvoysier 2025-02-25 09:07:19 +0000
  • 0cff388a10 ci(neuron): rename precompilation job David Corvoysier 2025-02-25 10:40:42 +0000
  • 40e2f3f995 ci(neuron): do not run tests twice David Corvoysier 2025-02-25 10:39:36 +0000
  • 53c1226939 test(neuron): add helper to batch export models David Corvoysier 2025-02-24 17:37:37 +0000
  • 70e846d53b test(neuron): refactor to prepare batch export David Corvoysier 2025-02-24 17:36:26 +0000
  • b0069e0485
    fix: run linters and fix formatting (#3057) drbh 2025-02-25 16:11:34 -0500
  • 245f716706 fix: run linters and fix formatting drbh 2025-02-25 12:08:05 -0500
  • 77dca4dfbe fix prehooks issues baptiste 2025-02-25 15:24:35 +0000
  • 4329deb283
    Update monitoring.md Sadra Barikbin 2025-02-25 17:03:14 +0330
  • 31535bcde2 fix: fix style baptiste 2025-02-25 13:16:11 +0000
  • c08005a4cd feat(gaudi): new gaudi backend working baptiste 2025-02-24 09:48:44 +0000
  • cc754c43c0 wip(gaudi): import server and dockerfile from tgi-gaudi fork Ubuntu 2025-02-24 08:40:44 +0000
  • d7a24c03cf
    some minor fix (#3048) Wang, Yi 2025-02-25 19:07:55 +0800
  • bc4eb25d41 fix tp quant skip jiqing-feng 2025-02-24 17:27:14 +0000
  • a332862510 fix format jiqing-feng 2025-02-24 16:22:34 +0000
  • 0bad926fb8 fix modules_to_not_convert jiqing-feng 2025-02-24 16:11:48 +0000
  • cea9dbc971
    You need to seek apparently. (#3049) Nicolas Patry 2025-02-24 14:58:23 +0100
  • c00add9c03
    Add Neuron backend (#3033) David Corvoysier 2025-02-24 09:10:05 +0100
  • 3e7a879773 xpu 2.6 update Wang, Yi A 2025-02-24 02:56:02 +0000
  • 6706c9b4d9 test(neuron): added a small script to prune test models David Corvoysier 2025-02-21 15:02:52 +0000
  • cd477d800c test(neuron): avoid using image sha when exporting models David Corvoysier 2025-02-21 12:46:57 +0000
  • 05ca5e4c0f ci: doing a precompilation step (with a different token). Nicolas Patry 2025-02-20 15:57:33 +0100
  • 10b57727c2 test(neuron): no error anymore when requesting too many tokens David Corvoysier 2025-02-20 17:26:56 +0000
  • 4c0fa92cb4 feat(neuron): avoid installing CUDA in image David Corvoysier 2025-02-20 10:13:49 +0000
  • b5e98a6d5a test(neuron): use smaller llama model David Corvoysier 2025-02-19 14:51:43 +0000
  • 6f92198eb9 fix(neuron): avoid using Levenshtein David Corvoysier 2025-02-19 14:05:29 +0000
  • 88a0948692 refactor: remove sagemaker entry-point David Corvoysier 2025-02-19 09:00:54 +0000
  • ae37890eef fix(neuron): export models from container in test fixtures David Corvoysier 2025-02-18 17:47:54 +0000
  • bb51c5138c feat: add neuron case to build ci drbh 2025-02-17 15:22:02 +0000
  • 3bcc523e76 review: --privileged should be the exception David Corvoysier 2025-02-18 14:15:08 +0000
  • a053523e93 review: remove ureq pinned version David Corvoysier 2025-02-18 13:54:31 +0000
  • 00931438ea review: do not use latest tag David Corvoysier 2025-02-18 13:50:25 +0000
  • 9c998f9f7e test: add --neuron option David Corvoysier 2025-02-18 12:30:34 +0000
  • a3dcdab706 test(neuron): merge integration tests and fixtures David Corvoysier 2025-02-18 10:32:10 +0000
  • 68e1c608f6 fix(neuron): increase ulimit when building image David Corvoysier 2025-02-17 15:22:36 +0000
  • 90578bfc65 feat(neuron): add server and integration tests David Corvoysier 2025-02-12 09:10:47 +0000
  • 27526a55bc feat(neuron): add server standalone installation David Corvoysier 2025-02-11 15:51:09 +0000
  • d0ed1918d7 feat: add neuron backend David Corvoysier 2025-02-11 09:53:16 +0000
  • 10e174b94a
    You need to seek apparently. Nicolas Patry 2025-02-21 18:47:17 +0100
  • 16793c7f51 ci: add missing needs for integration tests neuron_backend_ci_test David Corvoysier 2025-02-21 13:15:07 +0000
  • e12a8ab1cd ci: fix typo David Corvoysier 2025-02-21 12:53:05 +0000
  • 7178ee718d ci: enclose if clause in brackets David Corvoysier 2025-02-21 10:06:39 +0000
  • e8d04ec683 ci: temporarily run only gpt2 tests David Corvoysier 2025-02-21 10:16:47 +0000
  • a381aed512 ci: temporarily remove documentation workflows David Corvoysier 2025-02-21 10:17:55 +0000
  • a2a351e6e0 ci: temporarily restrict build to intel-cpu and neuron David Corvoysier 2025-02-21 10:04:46 +0000
  • 0c0488a754 test(neuron): added a small script to prune test models David Corvoysier 2025-02-21 15:02:52 +0000
  • 1e4e406d77 test(neuron): avoid using image sha when exporting models David Corvoysier 2025-02-21 12:46:57 +0000
  • 97c5f7e685
    Use rotary kernel from the Hub (#3041) Daniël de Kok 2025-02-21 13:55:31 +0100