Commit Graph

  • 472bf098e4 Fix URL. Nicolas Patry 2024-03-21 19:40:52 +0000
  • ed29d6eeab
    Remove unecessary cuda graph. (#1664) Nicolas Patry 2024-03-21 20:11:05 +0100
  • 2e754ffd2e Remove unecessary cuda graph. Nicolas Patry 2024-03-21 20:09:28 +0100
  • 78f87d5a0c Temporary implem of torch.compile on our stuff. tmp_torch_compile Nicolas Patry 2024-03-21 18:56:40 +0000
  • 6f18cf7fb4 Add the image. Nicolas Patry 2024-03-21 19:18:39 +0100
  • 12fac2e03f Repair idefics integration tests. Nicolas Patry 2024-03-21 18:06:06 +0100
  • de6cb15fa5
    fix: improve tool type, bump pydantic and outlines (#1650) drbh 2024-03-21 12:45:56 -0400
  • 4f09c80cd8
    fix: prefer spaces url over temp url (#1662) drbh 2024-03-21 12:34:25 -0400
  • 887db5f641 fix: prefer spaces url over temp url drbh 2024-03-21 16:10:44 +0000
  • 6f15ac60b2
    feat: support force downcast after FastRMSNorm multiply for Gemma (#1658) drbh 2024-03-21 05:25:11 -0400
  • 704d4ddfaa fix: simplify syntax drbh 2024-03-21 03:33:56 +0000
  • 5b076dfcf2 feat: prefer gemma specific rms drbh 2024-03-21 03:28:03 +0000
  • b307fce653 feat: support force downcast after FastRMSNorm multiply drbh 2024-03-20 17:47:20 +0000
  • cd3bd4d9e1 fix: struct naming and min versions drbh 2024-03-20 16:30:52 +0000
  • dfbd9a39a2
    feat: bump minijina and add test for core templates (#1626) drbh 2024-03-20 09:13:46 -0400
  • a122582bc7 chore: update outlines function call to new name Jannis Schönleber 2024-03-19 18:19:57 +0100
  • 7eb3d75df1 fix: continue skipping long running tests drbh 2024-03-19 15:30:17 +0000
  • 6fe5681606 fix: bump integration test deps drbh 2024-03-19 03:08:56 +0000
  • 8af58129b6 fix: refactor to use .data from pydantic model drbh 2024-03-18 22:32:04 +0000
  • b087cbfaa6 fix: bump pydantic version drbh 2024-03-18 21:55:48 +0000
  • 09f2e8ed13 feat: bump outlines and pydantic logic drbh 2024-03-18 21:00:23 +0000
  • 03003d1eaf fix: adjust formatting drbh 2024-03-18 12:37:10 -0400
  • 7236ceb941 feat: test improve struct drbh 2024-03-12 11:31:30 +0000
  • d523e1b320 feat: add custom templates and bump minijinja for namespace support drbh 2024-03-08 17:34:15 +0000
  • a50447dc72 feat: bump minijina and add test for core templates drbh 2024-03-06 15:21:41 +0000
  • 7ad4a62458 fix: prefer tool call vector over object drbh 2024-03-11 13:26:06 +0100
  • fc013cdb48 chore(cargo-toml): apply lto fat and codegen-units of one somehowchris 2024-03-18 16:00:52 +0100
  • d752317b5f
    Correct input_length since habana extend input_length to max_input_length (#103) Wang, Yi 2024-03-18 22:23:13 +0800
  • b45f648483
    Add warmup for logits processors (#107) Karol Damaszke 2024-03-18 15:17:47 +0100
  • 8504f9c41c
    Improve README clarity (#106) jkaniecki 2024-03-18 15:15:07 +0100
  • c1095bb61a
    add debug test_rocm Guillaume LEGENDRE 2024-03-18 11:54:31 +0100
  • ece6b94118
    remove proxy Guillaume LEGENDRE 2024-03-18 11:01:09 +0100
  • 6de10b659d
    new tailscale action Guillaume LEGENDRE 2024-03-18 10:42:14 +0100
  • 0d72af5ab0
    Fixing minor typo in documentation: supported hardware section (#1632) Sachin Varghese 2024-03-18 02:33:58 -0400
  • 23fba672e8
    Fix index in ChatCompletionChunk (#1648) Lucain 2024-03-16 17:14:29 +0100
  • 3686f9c10d fix tool call as well Wauplin 2024-03-15 14:30:06 +0100
  • 9a12e141d7 Fix index in chat completion chunk Wauplin 2024-03-15 14:16:25 +0100
  • 0d9917f744
    Update peft + transformers + accelerate + bnb + safetensors (#1646) abhishek thakur 2024-03-15 13:23:26 +0100
  • 61d86df160 update other libs Abhishek Thakur 2024-03-15 11:51:10 +0100
  • 6afcc06bb5
    Update peft to 0.9.0 abhishek thakur 2024-03-15 11:17:36 +0100
  • a4d5c3f40f
    Fix the generate_stream crash in concurrent query (#105) yuanwu2017 2024-03-15 17:54:56 +0800
  • 3d81a80577
    Fix incorrect setting of max_new_tokens in warmup (#104) Wang, Yi 2024-03-13 23:19:40 +0800
  • 7149ac30e6
    Fix the issue of out of range (#98) Yao Matrix 2024-03-13 17:09:53 +0800
  • 8a5bcba227
    Upgrade nix version from 0.27.1 to 0.28.0 (#1638) yuanwu2017 2024-03-12 20:37:33 +0800
  • f3aa8a38f7
    fix: add signal feature drbh 2024-03-12 08:36:38 -0400
  • d3711a6646
    Use a better model for the quick tour (#1639) lewtun 2024-03-12 11:25:32 +0100
  • 602a920ec5
    Update nix version (#102) jkaniecki 2024-03-11 16:21:04 +0100
  • f3c0cf1b39 Upgrade nix version from 0.27.1 to 0.28.0 yuanwu 2024-03-11 10:40:19 -0400
  • 8bfd857f03
    Use a better model for the quick tour lewtun 2024-03-11 10:29:40 +0100
  • 2111ae1bd2 fix: LlamaTokenizerFast to AutoTokenizer at flash_mistral.py SeongBeomLEE 2024-03-11 13:27:09 +0900
  • 365f277900
    Clean-up README (#96) Karol Damaszke 2024-03-10 22:02:15 +0100
  • b5dcc87459 fix: include shared python library during rust build step router-grammar-compile drbh 2024-03-08 23:13:07 +0000
  • 65b5b4c36e fix: check start_states len and add states_to_token_maps to filter drbh 2024-03-08 22:58:30 +0000
  • 5b5cbb14d6 fix: remove GrammarType import typo drbh 2024-03-08 04:18:31 +0000
  • d031919c8a fix: remove compilation artifacts from logit processor drbh 2024-03-08 03:44:38 +0000
  • 1f7be736d2 feat: remove uncompile grammar and improve logit processor logic drbh 2024-03-08 03:34:19 +0000
  • 372865500d
    Fixing minor typo in documentation: supported hardware section Sachin Varghese 2024-03-07 15:43:02 -0500
  • c52a0f679e feat: prefer precompiled grammar drbh 2024-03-07 17:12:46 +0000
  • 4f7074ca71 feat: compile grammar and send over grpc drbh 2024-03-06 02:40:56 +0000
  • ad5f562aa5 feat: support grammar compilation worker via py03 drbh 2024-03-05 17:44:16 +0000
  • a7cc4dc9da fix: bump client version bump-client-0.6.2 drbh 2024-03-04 14:29:27 +0000
  • 8e14780bf4
    Wait 2sec once shard is ready to improve stability (#92) (#94) Karol Damaszke 2024-03-04 12:17:24 +0100
  • 7dbaf9e901
    fix: correctly index into mask when applying grammar (#1618) drbh 2024-03-01 12:22:01 -0500
  • 7e08751378
    fix: add missing stop parameter for chat request (#1619) drbh 2024-03-01 12:08:11 -0500
  • 02d8b18153 fix: add missing stop parameter for chat request drbh 2024-03-01 16:40:04 +0000
  • 80ae9ead28
    Set MAX_TOTAL_TOKENS automatically (#91) Karol Damaszke 2024-03-01 11:25:15 +0100
  • 57a5766848 feat: skip long runnning tests drbh 2024-03-01 04:38:34 +0000
  • a5c788cfe4
    Remove redundant fill op (#83) (#90) Karol Damaszke 2024-03-01 01:32:02 +0100
  • 1eff07c0e4 fix: remove duplicate test drbh 2024-02-29 23:18:22 +0000
  • 06fd5affa0 feat: split flash and non flash grammar tests drbh 2024-02-29 22:17:26 +0000
  • b47b161cab feat: update more snapshots avoid-zero-seed drbh 2024-02-29 22:06:13 +0000
  • 141e67a1bf fix: correctly index into mask when applying grammar drbh 2024-02-29 18:32:23 +0000
  • 3dd7da2198
    feat: accept legacy request format and response (#1527) drbh 2024-02-29 10:44:20 -0500
  • 03c2123244
    Use batched index_copy (#73) (#89) Karol Damaszke 2024-02-29 15:45:16 +0100
  • 9ed4d2c780
    Fix async client timeout (#1617) Hugo Abonizio 2024-02-29 11:41:49 -0300
  • 5eb6a7b32d
    Fix async client timeout Hugo Abonizio 2024-02-29 11:35:15 -0300
  • 5a3903ba99
    Fix idefics default. (#1614) Nicolas Patry 2024-02-29 13:16:34 +0100
  • 33bfb417b4 Fix idefics default. Nicolas Patry 2024-02-29 11:33:38 +0100
  • 343aa7a197
    fix: Handle concurrent grammar requests (#1610) drbh 2024-02-29 05:17:42 -0500
  • 8f6564ce0e
    Heap based router queue (#63) (#88) Karol Damaszke 2024-02-29 10:56:26 +0100
  • 7dbf4bf7a4
    Improve tensor slicing performance (#66) (#87) Karol Damaszke 2024-02-29 10:48:54 +0100
  • 3831f1bed5
    Add warmup for shift operation (#59) (#86) Karol Damaszke 2024-02-29 09:19:28 +0100
  • 022ce1eaaf
    Overhead reduction (#58) (#85) Karol Damaszke 2024-02-29 09:17:45 +0100
  • 212136dff8
    Log exceptions to debug.log (#52) (#84) Karol Damaszke 2024-02-29 09:14:42 +0100
  • c7ccfb87ff
    Grouped pad/shift/move operations (#57) (#82) Karol Damaszke 2024-02-29 04:16:44 +0100
  • 0370b0feda fix: simplify changes drbh 2024-02-28 19:54:46 +0000
  • 4ff9cb806b fix: persist grammar state after batch concat drbh 2024-02-28 19:44:51 +0000
  • e6bb3ff81f
    v1.4.3 (#1609) v1.4.3 OlivierDehaene 2024-02-28 16:12:14 +0100
  • bc46fa9cf6 fix toctree OlivierDehaene 2024-02-28 16:12:02 +0100
  • 9c3c3c10f1 v1.4.3 OlivierDehaene 2024-02-28 15:53:36 +0100
  • 26cdea5c0c
    feat: Qwen2 (#1608) OlivierDehaene 2024-02-28 15:50:31 +0100
  • 725f0e350d add speculative head OlivierDehaene 2024-02-28 14:58:43 +0100
  • d1d757e676
    feat: Qwen2 Model (#1584) Cheng Kuan Yong Jason 2024-02-28 21:34:17 +0800
  • 73403aa4db
    Merge branch 'feat/qwen' into qwen2 OlivierDehaene 2024-02-28 14:34:08 +0100
  • b40e833493
    feat: starcoder2 (#1605) OlivierDehaene 2024-02-28 12:07:08 +0100
  • 97e22369f4
    Fixing guidance docs. (#1607) Nicolas Patry 2024-02-28 12:05:15 +0100
  • b07cc6a824 Fixing guidance docs. Nicolas Patry 2024-02-28 12:03:09 +0100
  • 910d0a9062
    Fixing x-compute-time. (#1606) Nicolas Patry 2024-02-28 11:30:37 +0100
  • 1e284eb178 Fix x-compute-time header. Nicolas Patry 2024-02-28 11:21:41 +0100
  • e5564b7dcd add integration tests OlivierDehaene 2024-02-28 11:18:19 +0100