Commit Graph

  • fec3f8f21c Fixing cohere tokenizer. (#1697) Nicolas Patry 2024-04-05 16:44:19 +0200
  • 62672c6934 Push users to streaming in the readme. (#1698) Nicolas Patry 2024-04-05 16:44:10 +0200
  • fe063b8118 Pickle conversion now requires --trust-remote-code. (#1704) Nicolas Patry 2024-04-05 13:32:53 +0200
  • 29c316e5bb Add cuda graphs sizes and make it default. (#1703) Nicolas Patry 2024-04-04 23:01:56 +0200
  • 0bf856dc60 v1.4.5 (#1686) OlivierDehaene 2024-03-29 19:17:24 +0100
  • dc1ab2001d feat: Add dbrx support (#1685) OlivierDehaene 2024-03-29 18:49:36 +0100
  • 56670398f3 fix: handle batches with and without grammars (#1676) drbh 2024-03-28 12:02:01 -0400
  • d5ed4c110b fix: adjust logprob response logic (#1682) drbh 2024-03-28 12:01:46 -0400
  • 6ac93d8f28 v1.4.4 (#1668) OlivierDehaene 2024-03-22 18:44:05 +0100
  • da4199ed97 feat: cohere (#1660) OlivierDehaene 2024-03-22 17:59:25 +0100
  • ecdacbb73c Inline images for multimodal models. (#1666) Nicolas Patry 2024-03-22 17:14:54 +0100
  • 097e72a672 fix: LlamaTokenizerFast to AutoTokenizer at flash_mistral.py (#1637) SeongBeomLEE 2024-03-23 01:13:13 +0900
  • c4f92ec4b5 feat: update client to 0.7 (#1667) OlivierDehaene 2024-03-22 17:10:56 +0100
  • 07f05a86a4 Repair idefics integration tests. (#1663) Nicolas Patry 2024-03-21 22:21:03 +0100
  • 6729783a19 Remove unecessary cuda graph. (#1664) Nicolas Patry 2024-03-21 20:11:05 +0100
  • ab074c81b7 fix: improve tool type, bump pydantic and outlines (#1650) drbh 2024-03-21 12:45:56 -0400
  • b36c0f8436 fix: prefer spaces url over temp url (#1662) drbh 2024-03-21 12:34:25 -0400
  • d888bc2828 feat: support force downcast after FastRMSNorm multiply for Gemma (#1658) drbh 2024-03-21 05:25:11 -0400
  • 08525d9459 feat: bump minijina and add test for core templates (#1626) drbh 2024-03-20 09:13:46 -0400
  • 925f9c496b Fixing minor typo in documentation: supported hardware section (#1632) Sachin Varghese 2024-03-18 02:33:58 -0400
  • 50d8f99b8e Fix index in ChatCompletionChunk (#1648) Lucain 2024-03-16 17:14:29 +0100
  • 86c5ce5aa5 Update peft + transformers + accelerate + bnb + safetensors (#1646) abhishek thakur 2024-03-15 13:23:26 +0100
  • 7809825f0c Upgrade nix version from 0.27.1 to 0.28.0 (#1638) Karol Damaszke 2024-04-25 10:25:40 +0300
  • cd8163dce2 Use a better model for the quick tour (#1639) lewtun 2024-03-12 11:25:32 +0100
  • d4aebbd10a fix: correctly index into mask when applying grammar (#1618) drbh 2024-03-01 12:22:01 -0500
  • dc7c69e887 fix: add missing stop parameter for chat request (#1619) drbh 2024-03-01 12:08:11 -0500
  • 5a2a0ca0c0 feat: accept legacy request format and response (#1527) drbh 2024-02-29 10:44:20 -0500
  • 0a5755eb7a Fix async client timeout (#1617) Hugo Abonizio 2024-02-29 11:41:49 -0300
  • 0390b28b85 Fix idefics default. (#1614) Nicolas Patry 2024-02-29 13:16:34 +0100
  • e259625b8b fix: Handle concurrent grammar requests (#1610) drbh 2024-02-29 05:17:42 -0500
  • e9b200369c v1.4.3 (#1609) OlivierDehaene 2024-02-28 16:12:14 +0100
  • 666cdaaf16 feat: Qwen2 (#1608) OlivierDehaene 2024-02-28 15:50:31 +0100
  • 7c6a47bb7a feat: starcoder2 (#1605) OlivierDehaene 2024-02-28 12:07:08 +0100
  • bc6ab91a4e Fixing guidance docs. (#1607) Nicolas Patry 2024-02-28 12:05:15 +0100
  • 35f7c3f813 Fixing x-compute-time. (#1606) Nicolas Patry 2024-02-28 11:30:37 +0100
  • f215cc15ee Support tools (#1587) drbh 2024-02-28 05:10:27 -0500
  • 21d52c9ca1 Revamp medusa implementation so that every model can benefit. (#1588) Nicolas Patry 2024-02-26 19:49:28 +0100
  • 70ac5c3e10 fix: avoid default message (#1579) drbh 2024-02-22 08:56:42 -0500
  • d94343d695 fix: fix openapi schema (#1586) OlivierDehaene 2024-02-21 15:30:45 +0100
  • e7183c2c03 v1.4.2 (#1585) OlivierDehaene 2024-02-21 14:50:57 +0100
  • a461257066 feat: add support for Gemma (#1583) OlivierDehaene 2024-02-21 14:15:22 +0100
  • 3c6e6d8c3f fix(router): fix openapi and add jsonschema validation (#1578) OlivierDehaene 2024-02-21 11:05:32 +0100
  • 5addb84bfb fix: refactor syntax to correctly include structs (#1580) drbh 2024-02-20 10:38:35 -0500
  • c3053e872a improve endpoint support (#1577) drbh 2024-02-20 08:04:51 -0500
  • 5a54d915ae Fix mistral with length > window_size for long prefills (rotary doesn't create long enough cos, sin). (#1571) Nicolas Patry 2024-02-19 15:23:12 +0100
  • 2ac1b55c95 v1.4.1 (#1568) OlivierDehaene 2024-02-16 17:50:57 +0100
  • cf946b3984 feat: add chat template struct to avoid tuple ordering errors (#1570) OlivierDehaene 2024-02-16 16:37:32 +0100
  • 31b5e37f49 chore: add pre-commit (#1569) OlivierDehaene 2024-02-16 11:58:58 +0100
  • 69a2eadc52 Bugfix: eos and bos tokens positions are inconsistent (#1567) Aaron Mihalik 2024-02-16 05:44:04 -0500
  • cfccdf3d43 Added name field to OpenAI compatible API Messages (#1563) Aaron Mihalik 2024-02-15 13:30:31 -0500
  • 55acb86f42 Outlines guided generation (#1539) drbh 2024-02-15 04:28:10 -0500
  • 686b56a0c0 Small cleanup. (#1560) Nicolas Patry 2024-02-14 15:30:07 +0100
  • e93cc34a22 Improving mamba runtime by using updates (#1552) Nicolas Patry 2024-02-14 09:54:10 +0100
  • f6500bfaa3 Upgrade intermediary layer for nvidia too. (#1557) Nicolas Patry 2024-02-13 22:46:16 +0100
  • d05d930545 Fixing glibc version in the runtime. (#1556) Nicolas Patry 2024-02-13 17:43:47 +0100
  • 91b56a71dc feat: add deserialize_with that handles strings or objects with content (#1550) drbh 2024-02-13 10:01:02 -0500
  • 0c207f71ed feat: experimental support for cuda graphs (#1428) OlivierDehaene 2024-02-12 10:09:29 +0100
  • 518d30dec4 feat(router): add max_batch_size (#1542) OlivierDehaene 2024-02-09 12:38:41 +0100
  • 777e519277 ROCm AWQ support (#1514) Ilyas Moutawwakil 2024-02-09 10:45:16 +0100
  • 8415d4605d chore: bump ci rust version (#1543) drbh 2024-02-09 04:32:04 -0500
  • f1d8da3ba6 feat(server): add frequency penalty (#1541) OlivierDehaene 2024-02-08 18:41:25 +0100
  • 4c698fa6c2
    Adding support for HF_HUB_OFFLINE support in the router. (#1789) Nicolas Patry 2024-04-23 23:38:30 +0200
  • 23d82b8fb6
    fix: avoid frequency and repetition penalty on padding tokens (#1765) drbh 2024-04-23 17:19:16 -0400
  • 429092683b Using @drbh patch. Nicolas Patry 2024-04-23 14:27:43 +0000
  • af24703708 Adding support for HF_HUB_OFFLINE support in the router. Nicolas Patry 2024-04-22 13:53:08 +0000
  • ba33c66b5b Updating the benchmarks so everyone uses openai compat layer. Nicolas Patry 2024-04-23 21:07:36 +0000
  • bfddfa5955
    Idefics2. (#1756) Nicolas Patry 2024-04-23 23:04:44 +0200
  • 986b4044d1
    Phi3 support (#1797) Nicolas Patry 2024-04-23 18:40:05 +0200
  • e72897004a black Mohit Sharma 2024-04-23 15:05:39 +0000
  • fbc5a6a120 add LLMM_Silu Mohit Sharma 2024-04-23 15:02:53 +0000
  • 9be1db3101
    feat: allow null eos and bos tokens in config (#1791) drbh 2024-04-23 10:26:54 -0400
  • 455cada527
    Add attribute descriptions for GenerateParameters (#1798) Lucain 2024-04-23 16:22:12 +0200
  • 12e310f2a9 Proper TP Nicolas Patry 2024-04-23 13:01:26 +0000
  • 5034f9ea84
    Add attribute descriptions for GenerateParameters Wauplin 2024-04-23 14:59:04 +0200
  • 9e278d1aec Updating llava net integration tests. Nicolas Patry 2024-04-23 09:51:10 +0000
  • 43796ce4f9 Revert "Removing params test (seems flaky in CI ?)" Nicolas Patry 2024-04-23 09:35:56 +0000
  • cec954e391 Update to peft 0.8.2 (#1537) Jason Stillerman 2024-02-08 06:44:04 -0500
  • 7d31cb6e75 Phi3 support. Nicolas Patry 2024-04-23 08:50:18 +0000
  • 51a4e62ed4 Impl simple mamba model (#1480) drbh 2024-02-08 04:19:45 -0500
  • b93d4ec604 Removing params test (seems flaky in CI ?) Nicolas Patry 2024-04-23 07:55:04 +0000
  • d05a9814dc fix: remove trailing space from docs drbh 2024-04-23 02:39:02 -0400
  • 99cb270f91 feat: use existing add_generation_prompt variable from config in temp… (#1533) drbh 2024-02-07 03:35:53 -0500
  • 62a40b8aa6 feat: add ie update to message docs (#1523) drbh 2024-02-02 10:31:11 -0500
  • e39ba494b8 [docs] Fix link to Install CLI (#1526) Pedro Cuenca 2024-02-02 14:05:30 +0100
  • 369ae2dcc1 Updating tokenizers. (#1517) Nicolas Patry 2024-02-01 16:26:48 +0100
  • 14b40bffba fix: tokenizer config should use local model path when possible (#1518) drbh 2024-02-01 09:39:32 -0500
  • 6c0b21bd14 Revert "Modify default for max_new_tokens in python client (#1336)" Nicolas Patry 2024-02-01 14:36:10 +0000
  • 2bf39314ba Hotfix the / health - route. (#1515) Nicolas Patry 2024-02-01 13:29:04 +0100
  • 1a0bfe3f7f Freshen up the README. Nicolas Patry 2024-02-01 10:23:37 +0100
  • 27daa511ec GPTNeoX: Use static rotary embedding (#1498) Dean Wyatte 2024-02-01 01:34:11 -0700
  • 11d8e7132f fix: improve messages api docs content and formatting (#1506) drbh 2024-01-31 11:26:22 -0500
  • bf72c03d0e feat: eetq gemv optimization when batch_size <= 4 (#1502) dtlzhuangz 2024-01-31 19:05:49 +0800
  • 86796bc78c Modify default for max_new_tokens in python client (#1336) freitng 2024-01-29 17:02:57 +0100
  • 89fa4fddb0 Create the compute type at launch time (if not provided in the env). (#1505) Nicolas Patry 2024-01-29 12:30:50 +0100
  • 050c5840b5 Sending compute type from the environment instead of hardcoded string (#1504) Nicolas Patry 2024-01-29 11:20:08 +0100
  • 433934519c Fixing top_n_tokens. (#1497) Nicolas Patry 2024-01-26 20:13:47 +0100
  • 0b20661cb7 Odd. Nicolas Patry 2024-04-22 21:28:08 +0000
  • f65d06cfb6 Fixed idefics2 integration tests. (Less VRAM is crucial). Nicolas Patry 2024-04-22 20:49:14 +0000
  • 1aa812da43 Making image passes image per image to save VRAM. Nicolas Patry 2024-04-22 20:40:21 +0000
  • 60d2757c36 Update router/src/config.rs Nicolas Patry 2024-04-22 18:40:28 +0200