Commit Graph

  • 9b6db5f793
    Support tools (#1587) drbh 2024-02-28 05:10:27 -0500
  • c84223590b add medusa OlivierDehaene 2024-02-28 11:02:39 +0100
  • a56bd736e6 feat: add starcoder2 OlivierDehaene 2024-02-26 17:47:12 +0100
  • 2122acc60f
    Add warmup for all possible shapes for prefill #49 (#81) Karol Damaszke 2024-02-28 10:40:13 +0100
  • 31bed905d4
    Update habana profiler (#50) (#80) Karol Damaszke 2024-02-28 09:57:40 +0100
  • d31fb62576
    Add more info to high-level profiler events (#46) (#79) Karol Damaszke 2024-02-28 09:55:50 +0100
  • 4bf58907d0 fix: adjust typos in docs drbh 2024-02-28 04:03:52 +0000
  • b5cacca1dc fix: update tests for streaming tools drbh 2024-02-28 03:56:37 +0000
  • 0fc7237380 feat: support streaming and improve docs drbh 2024-02-28 02:32:02 +0000
  • 7c04b6d664 fix: add guidance to toc drbh 2024-02-27 17:56:54 +0000
  • 4a81dd042f feat: improve tool serialization drbh 2024-02-27 17:52:46 +0000
  • f72155ae46 feat: add docs and address syntax tweaks drbh 2024-02-27 16:54:29 +0000
  • 960cc95a0e
    Update speculation.md adding_docs Nicolas Patry 2024-02-27 15:55:37 +0100
  • b6922d48de Add the speculation docs. Nicolas Patry 2024-02-27 15:49:58 +0100
  • 941d36f3fd
    Enable deferred token generation (#44) (#75) Karol Damaszke 2024-02-27 15:46:40 +0100
  • cea291718e Adding some docs. Nicolas Patry 2024-02-27 15:38:02 +0100
  • 6248c5610e
    Revert "Prefer prefill instead of decode when max_waiting_tokens==0 (#18)" (#45) (#76) Karol Damaszke 2024-02-27 11:56:45 +0100
  • a42dc2027b update commit feat/flash_decoding OlivierDehaene 2024-02-27 11:24:07 +0100
  • ef99678798 wip not faster OlivierDehaene 2024-01-25 15:26:51 +0100
  • bf700e7eef
    Revamp medusa implementation so that every model can benefit. (#1588) Nicolas Patry 2024-02-26 19:49:28 +0100
  • 5f8526235a feat: deprecate suffix and completion template drbh 2024-02-26 18:22:28 +0000
  • e69e68c8ea Small fixes in the weights loading logic. Nicolas Patry 2024-02-26 17:32:42 +0000
  • 915e5f088c Forgot docker launcher. Nicolas Patry 2024-02-26 17:07:54 +0000
  • bfec09ecc2 Fixing revision for the medusa test. Nicolas Patry 2024-02-26 16:31:40 +0000
  • 83b059bd27
    Bulk shifting (#40) (#70) jkaniecki 2024-02-26 17:29:56 +0100
  • e672f976fb Fix . Nicolas Patry 2024-02-26 16:31:01 +0100
  • de421dc53e feat: remove debug cuda avoid drbh 2024-02-26 15:19:07 +0000
  • fa40801fb6 Specify revision to force use safetensors files. Nicolas Patry 2024-02-26 15:24:48 +0100
  • 7a37655d8e feat: improve client for tools and fix default choice drbh 2024-02-26 14:18:09 +0000
  • 1445b9517d Remove dead file. Nicolas Patry 2024-02-26 15:15:02 +0100
  • c7793235d0 Download safetensors directly. Nicolas Patry 2024-02-26 11:25:12 +0000
  • 680a52f2f2 Fix GPT2 detection. Nicolas Patry 2024-02-26 11:20:39 +0000
  • af7ebc2639 fix: avoid long runnning test drbh 2024-02-26 04:29:35 +0000
  • eb762a9087 fix: avoid seed change drbh 2024-02-25 14:00:51 +0000
  • 7ec33206e6 fix: update grammar tests drbh 2024-02-25 13:59:53 +0000
  • 8f4aba6ad3
    Update dependencies (#69) regisss 2024-02-25 13:07:47 +0100
  • ba39951df2
    Merge branch 'main' into qwen2 Cheng Kuan Yong Jason 2024-02-24 15:48:57 +0800
  • a29893486e Added test cases Jason Cheng 2024-02-24 15:42:56 +0800
  • a32d3dd6cb feat: improve tools api and add tool prompt drbh 2024-02-24 01:58:54 +0000
  • ed95f1982d Fix gemma + medusa. Nicolas Patry 2024-02-23 21:13:34 +0000
  • a0095b5b8d Fixing. Nicolas Patry 2024-02-23 15:10:08 +0000
  • cd57f9c632 fix: avoid duplicate bos token fix-gemma-tokenization drbh 2024-02-23 14:53:18 +0000
  • bcd5c8e599 fix: update names and snaps drbh 2024-02-23 14:31:33 +0000
  • c3bd8ef445
    Add Fp8 support (#42) (#71) jkaniecki 2024-02-23 11:52:28 +0100
  • a490847702
    Sequence bucketing for prefill (#39) (#67) jkaniecki 2024-02-23 01:52:14 +0100
  • b40725e698
    Merge branch 'huggingface:main' into main dstnluong-google 2024-02-22 14:45:06 -0800
  • 0863dee463 import logger dstnluong-google 2024-02-22 22:11:42 +0000
  • c02a42db93 import os dstnluong-google 2024-02-22 22:07:10 +0000
  • 39fae920d8 typo dstnluong-google 2024-02-22 22:00:15 +0000
  • 6690daec09 feat: update tests drbh 2024-02-22 20:05:58 +0000
  • d2635dd01b fix: prefer seed 1 in all cases drbh 2024-02-22 18:51:02 +0000
  • 0e30e65822 feat: respect tool choice drbh 2024-02-22 18:26:49 +0000
  • 3ec57acac1 fix: update tests and snaps drbh 2024-02-22 17:34:02 +0000
  • f592df5234 Fix MPT, not sure about idefics. Nicolas Patry 2024-02-22 16:08:15 +0000
  • c7caac47f8 fix: update snapshot drbh 2024-02-22 14:11:14 +0000
  • e04c8981d1 fix: trim trailing spaces drbh 2024-02-22 13:32:45 +0000
  • 014d3fd4ef feat: add concrete tool types drbh 2024-02-22 04:19:47 +0000
  • 1aa2126206 fix: add chat docs to client drbh 2024-02-21 18:25:01 -0500
  • c8f2081171 feat: minimal tool support and chat client drbh 2024-02-16 17:18:21 +0000
  • 0f500f6d14 feat: basic tool support via grammar composition drbh 2024-02-16 16:00:59 +0000
  • 8eb88a7d75
    Bump rust version (#41) (#68) jkaniecki 2024-02-22 16:08:34 +0100
  • ac5a1c6f51
    fix: avoid default message (#1579) drbh 2024-02-22 08:56:42 -0500
  • 64d38afa9f Black. Nicolas Patry 2024-02-22 13:01:43 +0000
  • 9ad6086250
    Improve habana profile dev experience (#36) (#65) jkaniecki 2024-02-22 13:57:45 +0100
  • 7a9998d47c Remove the old logic. Nicolas Patry 2024-02-22 12:32:46 +0000
  • 21b3072288 Small updates. Nicolas Patry 2024-02-22 12:06:36 +0000
  • ac419f5e46 Upgrade ALL the code. Nicolas Patry 2024-02-22 11:37:05 +0000
  • f7ef414e38
    Remove unused pad_token_id for filter (#35) (#64) jkaniecki 2024-02-22 11:24:09 +0100
  • 8f590759e3
    Prefill optimization by allocating space only for the first output token (#34) (#62) jkaniecki 2024-02-22 04:55:43 +0100
  • 03fb94b853 gs:// model_id is already set to /tmp/gcs_model/ dstnluong-google 2024-02-21 22:03:55 +0000
  • 666b75ea87 Move GCS install to requirements files. dstnluong-google 2024-02-21 21:58:46 +0000
  • ee9b5a2be6 nit: Rename to Gemma dstnluong-google 2024-02-21 21:54:31 +0000
  • 74e09e6594
    Merge branch 'huggingface:main' into main dstnluong-google 2024-02-21 13:53:21 -0800
  • 2446f3ec32 [Tmp] Revamping medusa to make it orthogonal. Nicolas Patry 2024-02-21 21:37:27 +0000
  • c64866e05a
    exclude ubuntu.com domain Guillaume LEGENDRE 2024-02-21 19:45:45 +0100
  • e61f124f63
    fix Guillaume LEGENDRE 2024-02-21 19:33:37 +0100
  • 710b760602
    fix typo Guillaume LEGENDRE 2024-02-21 19:28:48 +0100
  • 3a85f1bd54
    try fixing buildx proxy Guillaume LEGENDRE 2024-02-21 19:27:28 +0100
  • 66f89120b5 fix: add back typo removed variable drbh 2024-02-21 11:55:06 -0500
  • 3e22cdd14c fix: pre commit trailing whitespace typo drbh 2024-02-21 11:41:06 -0500
  • 1724d06f9d fix: adjust typo drbh 2024-02-21 11:32:14 -0500
  • 544f848bde fix: improve completion request params and comments drbh 2024-02-21 11:31:11 -0500
  • 19c0248985 feat: update docs refactor and avoid cuda graphs when unavailable drbh 2024-02-21 11:17:41 -0500
  • 90d6330819 fix: add missing imports drbh 2024-02-21 10:52:46 -0500
  • 07ac99a93f
    Merge branch 'main' into support-legacy-completions-api drbh 2024-02-21 10:48:28 -0500
  • d0d0fd24a8
    update tailscale action version Guillaume LEGENDRE 2024-02-21 15:43:58 +0100
  • 92ab9d2ee6
    change runner and remove tailscale userspace for amd Guillaume LEGENDRE 2024-02-21 15:41:05 +0100
  • 80303b469c
    Do not limit hpu graphs by default (#32) (#61) jkaniecki 2024-02-21 15:38:00 +0100
  • 383478758b
    fix tailscale Guillaume LEGENDRE 2024-02-21 15:36:48 +0100
  • 010508cec8
    fix: fix openapi schema (#1586) OlivierDehaene 2024-02-21 15:30:45 +0100
  • 85b224b108 update OlivierDehaene 2024-02-21 15:30:28 +0100
  • 4fe79da8f3 fix: fix openapi schema OlivierDehaene 2024-02-21 15:28:46 +0100
  • 9c1cb81cd8
    v1.4.2 (#1585) v1.4.2 OlivierDehaene 2024-02-21 14:50:57 +0100
  • c1bdca91c2 default compat_return_full_text to true OlivierDehaene 2024-02-21 14:49:25 +0100
  • 605e0369c4 Set qkv attention layer bias to True Jason Cheng 2024-02-21 21:38:43 +0800
  • 6b6dec9ea1
    Transparent tokenizer uses explicit int32 (#31) (#60) jkaniecki 2024-02-21 14:24:41 +0100
  • 08827bef2e v1.4.2 OlivierDehaene 2024-02-21 14:17:03 +0100
  • c86f58d37c
    feat: add support for Gemma (#1583) OlivierDehaene 2024-02-21 14:15:22 +0100
  • bb57cb34e0 Added Qwen2 but generation is wrong Jason Cheng 2024-02-21 18:30:57 +0800
  • ff3e82c880 skip gemma integration tests for now OlivierDehaene 2024-02-21 14:14:00 +0100