Commit Graph

  • 91f29cabbe
    Delete .vscode/settings.json Nicolas Patry 2024-04-30 12:07:13 +0200
  • f6615080b9
    feat: add how it works section (#1773) drbh 2024-04-30 05:45:49 -0400
  • 8332fc4908
    fix: use get_speculate to the number of layers (#1737) OlivierDehaene 2024-04-30 11:45:26 +0200
  • 268e8d4935 fix: use get_speculate to the number of layers OlivierDehaene 2024-04-12 20:21:52 +0200
  • 743ecbca3a
    Add reference to TPU support (#1760) Brandon Royal 2024-04-30 05:39:52 -0400
  • 04d4765bad
    Small CI cleanup. (#1801) Nicolas Patry 2024-04-30 11:39:38 +0200
  • 63ff95f356 Update router/src/server.rs Nicolas Patry 2024-04-30 11:10:46 +0200
  • f54865e6da Small CI cleanup. Nicolas Patry 2024-04-23 21:15:04 +0000
  • 51ee60da74
    Add the missing tool_prompt parameter to Python client (#1825) Maziyar Panahi 2024-04-30 11:07:17 +0200
  • f75c1a5b26 Prepare release. Nicolas Patry 2024-04-30 10:52:37 +0200
  • e07d0ebc06 fix: rename header drbh 2024-04-29 20:20:30 +0000
  • 0d03620500 fix: remove redundant bullet drbh 2024-04-29 20:05:48 +0000
  • 784df59928 fix: move into tutorial and address many comments drbh 2024-04-29 20:02:26 +0000
  • c62a5b9abc fix: adjust filename in toc drbh 2024-04-26 00:14:58 +0000
  • 10f259e85f fix: rename file drbh 2024-04-26 00:09:49 +0000
  • 9e4823ba30 fix: add vlm to toc drbh 2024-04-26 00:08:43 +0000
  • c5131eee5c feat: add vlm docs and simple examples drbh 2024-04-25 23:20:47 +0000
  • 1f30217aec Add the missing tool_prompt to Python client bugfix/add_tools_prompt Maziyar Panahi 2024-04-28 22:04:19 +0200
  • d3b62bf117 fix: remove debug files from different branch martinigoyanes-fix-frequency-penalty drbh 2024-04-29 15:14:22 -0400
  • ecd00d56d6 fix: rebuilt and reran doc update drbh 2024-04-29 16:53:01 +0000
  • 5d8ef9f913 fix: adjust docs after rebase drbh 2024-04-29 16:47:43 +0000
  • 4d040b3478 fix: remove redundant struct drbh 2024-04-29 16:15:15 +0000
  • 91a705e1a9 feat: prefer custom deserializer for complex message content drbh 2024-04-29 16:07:53 +0000
  • a480273047 feat: accept variable content in chat request api drbh 2024-04-27 03:19:32 +0000
  • f2d760ff1a fix: update formatting via pre-commit drbh 2024-04-29 16:40:28 +0000
  • 27773963cd Merge commit 'refs/pull/1811/head' of github.com:huggingface/text-generation-inference into martinigoyanes-fix-frequency-penalty drbh 2024-04-29 16:37:51 +0000
  • 767c8195ef hold: docker stuff drbh 2024-04-29 16:36:53 +0000
  • eade737714
    Better graceful shutdown. (#1827) Nicolas Patry 2024-04-29 17:23:40 +0200
  • 17f5c3078b working & cached tunableop fxmarty 2024-04-29 14:55:59 +0000
  • 17a0ddd2f2 Reuse our wrapper. Nicolas Patry 2024-04-29 13:48:59 +0000
  • 7149f3602a Fixing Child.kill() to replace it with regular signal. Nicolas Patry 2024-04-29 13:41:09 +0000
  • e2ab122843 Better graceful shutdown. Nicolas Patry 2024-04-29 13:22:36 +0000
  • 600d033c04 Merge branch 'habana-main' into rebase_tgi_2.0 Karol Damaszke 2024-04-29 09:44:45 +0300
  • 9df5e8126c
    Add the missing tool_prompt to Python client Maziyar Panahi 2024-04-28 22:04:19 +0200
  • 007d5e54aa
    Changing the waiting_served_ratio default (stack more aggressively by default). (#1820) Nicolas Patry 2024-04-28 17:54:19 +0200
  • 5373fc4707 Fix tests. Nicolas Patry 2024-04-27 11:33:24 +0200
  • e9f03f822a
    Dummy CI run. (#1817) Nicolas Patry 2024-04-26 19:19:55 +0200
  • 8b8e8f6632
    Fixing qwen2. (#1818) Nicolas Patry 2024-04-26 19:19:08 +0200
  • 80c23bdd38 Changing the waiting_served_ratio default (stack more aggressively by default). Nicolas Patry 2024-04-26 19:16:39 +0200
  • 9cec099aa4 Fixing qwen2. Nicolas Patry 2024-04-26 16:41:07 +0200
  • ab1ec3e27e Update style. Nicolas Patry 2024-04-26 14:11:33 +0000
  • 6fc959765b Dummy comment to trigger CI after Intel merge. Nicolas Patry 2024-04-26 13:49:41 +0000
  • a8fd4236eb
    Blunder (#1815) Nicolas Patry 2024-04-26 15:51:09 +0200
  • 45ecf9d040
    add intel xpu support for TGI (#1475) Wang, Yi 2024-04-26 21:48:58 +0800
  • 1f79e8ce8c Fix use_v1 after rebase. Nicolas Patry 2024-04-26 13:47:36 +0000
  • f9cf345625
    Adding new env variables for TPU backends. (#1755) Nicolas Patry 2024-04-26 15:44:44 +0200
  • b67ce71232 Add build-and-push-image for Intel GPUs Morgan Funtowicz 2024-04-26 10:48:25 +0200
  • 88153796e0 re-enable xpu Wang, Yi A 2024-04-15 22:38:22 -0700
  • 878101e696 no requirements_common.txt, update dockerfile Wang, Yi A 2024-04-01 00:45:16 -0700
  • 0343a4b71c update the API and dockerfile Wang, Yi A 2024-03-31 22:48:42 -0700
  • 02537ec663 update docker file Wang, Yi A 2024-03-06 04:46:56 -0800
  • 23a1cb0511 align to ipex llm ops Wang, Yi A 2024-03-05 18:29:31 -0800
  • 515a0edebe add xpu smi support in env runtime Wang, Yi A 2024-01-31 17:19:31 -0800
  • bc069db165 fix review comments Wang, Yi A 2024-01-30 00:32:28 -0800
  • 49cd0ce943 add intel xpu support for TGI Wang, Yi A 2024-03-05 17:52:53 -0800
  • bbc547ad8d
    2nd round of benchmark modifications (tiny adjustements to avoid overloading the host). (#1816) Nicolas Patry 2024-04-26 15:39:00 +0200
  • 99ac8783d2 2nd round of benchmark modifications (tiny adjustements to avoid overloading the host). Nicolas Patry 2024-04-26 13:38:03 +0000
  • 193dbb683e use released torch 2.3 fxmarty 2024-04-26 09:50:50 +0000
  • b8da90241b run integration tests on rocm fxmarty 2024-04-26 09:46:03 +0000
  • 7502367043 Merge branch 'main' into mi300-compat fxmarty 2024-04-26 11:28:42 +0200
  • 66b2015586 WhaT? Nicolas Patry 2024-04-26 11:24:44 +0200
  • 91eb4e555f
    Hgraph dill patch (#131) Yaser Afshar 2024-04-26 02:08:15 -0700
  • 37aabf8571
    Move call to adapt_transformers_to_gaudi earlier in the code (#133) regisss 2024-04-26 11:07:27 +0200
  • c6a31b9e2b v2.0.0 (#1736) OlivierDehaene 2024-04-12 18:38:34 +0200
  • 6ad5aa7180 Fix typo in guidance.md (#1735) Ikko Eltociear Ashimine 2024-04-12 23:51:07 +0900
  • f6d5c2edf2 feat: medusa v2 (#1734) OlivierDehaene 2024-04-12 16:24:45 +0200
  • 661081d2d2 Improve the defaults for the launcher (#1727) Nicolas Patry 2024-04-12 14:20:31 +0200
  • a6890cbea9 feat: update readme and container version add-quickstart-script drbh 2024-04-26 00:46:11 +0000
  • d4077f70db feat: add one line quickstart drbh 2024-04-26 00:29:43 +0000
  • d7983d93be fix: skip all mistral test to enable CI skip-mistral-test drbh 2024-04-25 18:04:07 -0400
  • 949b889c33 fix: take into account logits frequency so far in a generation stream when apply freq penalty martini 2024-04-25 23:47:20 +0200
  • ee47973a2f
    Use the generation config. (#1808) Nicolas Patry 2024-04-25 19:41:50 +0200
  • eb08b9faef
    Update guidance docs to reflect grammar support in API (#1775) dr3s 2024-04-25 13:11:26 -0400
  • ffea15d6b6 Ignore missing generation config. Nicolas Patry 2024-04-25 16:42:35 +0000
  • 1f1885d911 Fix the openai backend + evading the @property of tokenizer.eos_token_id. Nicolas Patry 2024-04-25 16:20:25 +0000
  • fd705ef292 Damned tensor equality. Nicolas Patry 2024-04-25 15:04:53 +0000
  • 7a62f74d7f chore(cargo-toml): apply lto fat and codegen-units of one (#1651) Christof Weickhardt 2024-04-12 12:34:13 +0200
  • e6421f6e53 fix(router): fix a possible deadlock in next_batch (#1731) OlivierDehaene 2024-04-12 10:59:04 +0200
  • 0707a094b1 Upgrade EETQ (Fixes the cuda graphs). (#1729) Nicolas Patry 2024-04-12 08:15:28 +0200
  • 935d56abfe Fp8 Support (#1726) Nicolas Patry 2024-04-12 08:13:30 +0200
  • 194fcb4a3d Dev/mask ldconfig output v2 (#1716) oOraph 2024-04-11 19:31:48 +0200
  • c4ee0a653d Revert "Easier defaults for models stemmed from configs." Nicolas Patry 2024-04-11 12:51:57 +0000
  • e428c7c246 Easier defaults for models stemmed from configs. Nicolas Patry 2024-04-11 12:48:39 +0000
  • d9dcfe4521 Update libraries (#1713) abhishek thakur 2024-04-11 10:37:35 +0200
  • d1d0b3cbd6 hotfix: mixtral OlivierDehaene 2024-04-10 18:38:08 +0200
  • a1b65e5919 fix: fix CohereForAI/c4ai-command-r-plus (#1707) OlivierDehaene 2024-04-10 17:20:25 +0200
  • 650f45ce77 Python. Nicolas Patry 2024-04-25 14:33:29 +0000
  • 2b2f4dee94 Adding Llava-Next (Llava 1.6) with full support. (#1709) Nicolas Patry 2024-04-09 21:32:00 +0200
  • f6243fc8ad Moving logic closer to use. Nicolas Patry 2024-04-25 16:02:11 +0200
  • fccf5edf45
    Updating the benchmarks so everyone uses openai compat layer. (#1800) Nicolas Patry 2024-04-25 15:42:17 +0200
  • f33fccfb13 Handle None. Nicolas Patry 2024-04-25 15:41:18 +0200
  • 12b3765896 Revamp slightly. Nicolas Patry 2024-04-25 13:38:06 +0000
  • 0acac5cb7a
    feat: improve temperature logic in chat (#1749) drbh 2024-04-25 09:31:35 -0400
  • ec0d913434 Fmt + clippy. Nicolas Patry 2024-04-25 14:59:26 +0200
  • 80fda35249 Use the generation config. Nicolas Patry 2024-04-25 14:57:20 +0200
  • 7d07e92d94 Lint. Nicolas Patry 2024-04-25 14:35:58 +0200
  • 351bd5f173 Automatic quantization config. (#1719) Nicolas Patry 2024-04-09 10:27:57 +0200
  • fb998dac5c Revert license to Apache 2.0 (#1714) OlivierDehaene 2024-04-08 15:06:16 +0200
  • 8d4aec0546 Regenerate ld.so.cache (#1708) oOraph 2024-04-08 08:52:10 +0200
  • 3417398c9a Force weights_only (before fully breaking pickle files anyway). (#1710) Nicolas Patry 2024-04-05 19:23:57 +0200