Commit Graph

  • 3bc1cf873d credit fxmarty 2024-05-17 15:29:42 +0000
  • c4cf8b49d1
    Add TGI monitoring guide through Grafana and Prometheus (#1908) fxmarty 2024-05-17 16:34:44 +0200
  • 92361f9c9b empty fix-version-install fxmarty 2024-05-17 13:36:05 +0000
  • 232e8d5227
    MI300 compatibility (#1764) fxmarty 2024-05-17 15:30:47 +0200
  • 1e7d8cf834 better dashboard fxmarty 2024-05-17 10:05:59 +0000
  • a60fa8406a
    Removing some unused code. (#1915) Nicolas Patry 2024-05-17 11:35:49 +0200
  • f5007ebcc4 tentatively fix build workflow fxmarty 2024-05-17 09:27:22 +0000
  • eea3226780 cleanup dockerfile fxmarty 2024-05-17 09:15:11 +0000
  • 2a7ba6ee9c precise amd doc fxmarty 2024-05-17 09:07:15 +0000
  • 956ac30ab9 cleanup fastlinear fxmarty 2024-05-17 09:02:14 +0000
  • 3ded96fb4c nicer diff x2 fxmarty 2024-05-17 08:55:37 +0000
  • 8d7f18f41e diff nicer fxmarty 2024-05-17 08:53:08 +0000
  • 7c6b9a0963 remove unnecessary imports fxmarty 2024-05-17 08:50:20 +0000
  • c8475594bc reflect in doc that tunableop is default fxmarty 2024-05-17 08:47:02 +0000
  • a040a59068 refactor model_id, make tunableop default fxmarty 2024-05-17 08:46:14 +0000
  • df0a453693 fixes on review fxmarty 2024-05-17 07:53:27 +0000
  • c9455730d7 update version fxmarty 2024-05-17 07:39:59 +0000
  • fc127312df update tgi version fxmarty 2024-05-17 07:35:43 +0000
  • a013c3ae68 Removing some unused code. Nicolas Patry 2024-05-17 07:35:28 +0000
  • 3b5d93e68d
    Fixing signals. (#1910) Nicolas Patry 2024-05-16 21:40:10 +0200
  • 68ec6cfa37 Fixing signals. Nicolas Patry 2024-05-16 15:58:22 +0000
  • b3dd3902e7
    Types. (#1909) Nicolas Patry 2024-05-16 17:21:00 +0200
  • eb7cc6b24c A few more tests + easier to read format. Nicolas Patry 2024-05-16 17:03:07 +0200
  • f5d43414c2
    Fixing types. (#1906) Nicolas Patry 2024-05-16 16:59:05 +0200
  • 265c76d328 black fxmarty 2024-05-16 14:46:47 +0000
  • 56909c2843 fix typos fxmarty 2024-05-16 13:44:23 +0000
  • b25d5aa900 add guide fxmarty 2024-05-16 13:39:16 +0000
  • 74eb16cb00 Output is pure text. Nicolas Patry 2024-05-16 14:43:03 +0200
  • 0812e3bdc9 typo fxmarty 2024-05-16 11:50:31 +0000
  • afc747337a documentation fxmarty 2024-05-16 11:43:40 +0000
  • 31865be72f Error during the rebase. Nicolas Patry 2024-05-16 13:29:49 +0200
  • 961a873305 Change deltas too. Nicolas Patry 2024-05-16 13:23:24 +0200
  • 2a87dd7274 Fixing types. Nicolas Patry 2024-05-16 12:16:02 +0200
  • f219124711 Merge branch 'main' into mi300-compat fxmarty 2024-05-16 11:02:51 +0200
  • f8d37c14d9 apply suggestions fxmarty 2024-05-16 09:00:37 +0000
  • d8402eaf67
    OpenAI function calling compatible support (#1888) phangiabao98 2024-05-16 15:17:00 +0700
  • 40213c957f
    Pali gemma modeling (#1895) v2.0.3 drbh 2024-05-16 00:58:47 -0400
  • 6c715f8183
    [Bug Fix] Update torch import reference in bnb quantization (#1902) Dhruv Srikanth 2024-05-15 20:08:32 +0100
  • 887ffe7e45 fix: bump huggingface_hub version improve-dynamic-message-content drbh 2024-05-15 19:02:25 +0000
  • 7f97fdac84 Upgrade mamba. Nicolas Patry 2024-05-15 18:50:15 +0000
  • f8be8d5da7 feat: improve serde add tests and cleanup drbh 2024-05-15 17:26:50 +0000
  • 90059707a8 Sshing a cuda 12.4 Nicolas Patry 2024-05-15 17:17:32 +0000
  • fcb62c71e2 Another attempt. Nicolas Patry 2024-05-15 16:58:55 +0000
  • db7190d609
    [Bug Fix] Update import in quantization layers from nn to torch.nn based on import statements in the file header Dhruv Srikanth 2024-05-15 17:03:00 +0100
  • f8337a9e58 Using updated runner. Nicolas Patry 2024-05-15 15:24:25 +0000
  • f3f7140110 DEbugging this nightmare. Nicolas Patry 2024-05-15 14:23:41 +0000
  • dc0b8d76b5 Change the dockerfile. It builds locally, something might be up in AWS env. Nicolas Patry 2024-05-15 13:58:59 +0000
  • a69ef52cf6
    feat: add deprecation warning to clients (#1855) drbh 2024-05-15 09:40:07 -0400
  • 368c057cbf Trying to understand the weird failure. Nicolas Patry 2024-05-15 13:31:10 +0000
  • 81e7aacbe1 Revert "Revert "Installing git."" Nicolas Patry 2024-05-15 13:23:29 +0000
  • ec9260135a Revert "Installing git." Nicolas Patry 2024-05-15 13:15:10 +0000
  • b7e98ba635 fix various merge errors fxmarty 2024-05-15 12:20:48 +0000
  • c683597b42 fix merge issues fxmarty 2024-05-15 12:06:54 +0000
  • 79b15febdb Installing git. Nicolas Patry 2024-05-15 11:58:42 +0000
  • a70b087e71
    Removing accepted ids in the regular info logs, downgrade to debug. (#1898) Nicolas Patry 2024-05-15 13:56:07 +0200
  • 4086d93453 Removing accepted ids in the regular info logs, downgrade to debug. Nicolas Patry 2024-05-15 11:37:15 +0000
  • e8d02188d0 Small updates. Nicolas Patry 2024-05-15 11:36:28 +0000
  • b5bc6e5c4e
    Add GPT-2 with flash attention (#1889) Daniël de Kok 2024-05-15 13:31:22 +0200
  • 3b011ed3ea update layers files fxmarty 2024-05-15 10:56:52 +0000
  • 1bcaf8f5ca Fixed. Nicolas Patry 2024-05-15 10:21:16 +0000
  • 65bc0aaa58 Working integration-tests. Nicolas Patry 2024-05-15 10:17:07 +0000
  • f32fdd0fa1 Merge branch 'main' into mi300-compat (WIP) fxmarty 2024-05-15 11:31:51 +0200
  • c5015ad636 clean dockerfils fxmarty 2024-05-15 09:08:38 +0000
  • 8acd126710 Add GPT-2 with flash attention Daniël de Kok 2024-05-10 15:54:18 +0000
  • 2295f32689 rust format with cargo fmt Bao Phan 2024-05-15 13:14:25 +0700
  • 17ac93efd3 fix: default add special tokens to avoid vlm regressions drbh 2024-05-15 04:42:55 +0000
  • 70713fc292 fix: improve pali test and add snapshot drbh 2024-05-14 23:09:28 +0000
  • 61b49859da fix fxmarty 2024-05-14 19:58:19 +0000
  • d6e306c2b3 fix: apply paligemma template conditionally drbh 2024-05-14 15:56:04 -0400
  • 1daf984b57 fix: cargo fmt tweak ci-run-openai-function-calling-compatible-support drbh 2024-05-14 15:51:46 -0400
  • 7fe123ff36 Merge commit 'refs/pull/1888/head' of github.com:huggingface/text-generation-inference into main drbh 2024-05-14 18:35:00 +0000
  • 92f1338b84
    Correct 'using guidance' link (#1892) Brandon Lockaby 2024-05-14 14:23:39 -0400
  • 33bc7212af
    Fixing truncation. (#1890) Nicolas Patry 2024-05-14 18:15:56 +0200
  • c119ac4d1d Fixed PaliGemma. Nicolas Patry 2024-05-14 15:58:19 +0000
  • f2fecdceca patch again amd_hip_bf16 since we downgraded to rocm6.0 fxmarty 2024-05-14 14:18:10 +0000
  • 16f9ff8965
    Update README with new Docker image (#143) regisss 2024-05-14 15:31:52 +0200
  • 517878eede
    Correct 'using guidance' link Brandon Lockaby 2024-05-14 09:21:33 -0400
  • b0c1fa6557 add rocm 6.0.2 dockerfile fxmarty 2024-05-14 12:54:07 +0000
  • 67e833cedb Back functional gemma. Nicolas Patry 2024-05-14 09:21:13 +0000
  • ebbe7edca4 Don't break what's not broken. Nicolas Patry 2024-05-14 09:20:45 +0000
  • 9b9614cea3 fix: small test tweak drbh 2024-05-14 03:16:01 +0000
  • 5b3b8fd7b6 fix: prefer gemma rotary embed and split attention weight drbh 2024-05-14 03:15:32 +0000
  • 6e8a2110f8 fix: adjust inputs_embeds passed to language model and debug drbh 2024-05-10 17:32:14 +0000
  • 4df1b25ddb fix: typo and lint drbh 2024-05-10 16:17:59 +0000
  • 36fb4b5a7a fix: adjust image and text merge logic drbh 2024-05-10 16:13:11 +0000
  • d503007fcf fix: debug avoid scaling embed drbh 2024-05-10 03:44:51 +0000
  • e13c08f57f fix: adjust siglip attention drbh 2024-05-09 19:13:56 +0000
  • 23294344c6 fix: debugging drbh 2024-05-09 14:23:56 +0000
  • b07b53efba feat: improve config and refactor drbh 2024-05-09 01:25:59 +0000
  • 5fd72ed06c feat: load and query model drbh 2024-05-08 16:21:44 -0400
  • 07a50523b3 Fixing truncation. Nicolas Patry 2024-05-14 12:20:35 +0000
  • 6afd82a1e7
    Update optimum-habana version to 1.11.1 (#138) (#142) Karol Damaszke 2024-05-14 14:13:31 +0200
  • 91b0aa69d1
    Update peft version to 0.10.0 (#141) Karol Damaszke 2024-05-14 13:49:55 +0200
  • e3d765645a
    MLPSpeculator. (#1865) Nicolas Patry 2024-05-14 12:33:18 +0200
  • dcb727c232 Reload model_type when speculator is found. Nicolas Patry 2024-05-14 08:13:55 +0000
  • 3136f27f36
    Add: Support for the Falcon2 11B architecture (#1886) Nilabhra Roy Chowdhury 2024-05-14 10:06:02 +0200
  • 8ce8265966 chore: removed repeating code. Nilabhra 2024-05-14 11:58:12 +0400
  • e45ede2cde Merge branch 'main' into feat/falcon-11b Nilabhra 2024-05-14 11:40:31 +0400
  • d619666d23 fix: setting the rotary base from the config for the grouped query models. Nilabhra 2024-05-14 10:14:18 +0400
  • 2fcbf5f3b9 add: support for falcon-10B architecture. Nilabhra 2024-04-15 13:52:20 +0400