Commit Graph

  • 17a2c87f5c Refactor layers. (#1866) Nicolas Patry 2024-05-13 12:44:30 +0200
  • 9e4b25c66b fix: setting the rotary base from the config for the grouped query models. Nilabhra 2024-05-14 10:14:18 +0400
  • 5e8db7c14f add: support for falcon-10B architecture. Nilabhra 2024-04-15 13:52:20 +0400
  • 011887f15c chore: removed unused import. Nilabhra 2024-05-14 11:00:45 +0400
  • 56ed686942 Refactor layers. (#1866) Nicolas Patry 2024-05-13 12:44:30 +0200
  • c41573c67c fix: setting the rotary base from the config for the grouped query models. Nilabhra 2024-05-14 10:14:18 +0400
  • 46ada47963 add: support for falcon-10B architecture. Nilabhra 2024-04-15 13:52:20 +0400
  • d3d83e7d04 Refactor layers. (#1866) Nicolas Patry 2024-05-13 12:44:30 +0200
  • dcd2b4425c fix: setting the rotary base from the config for the grouped query models. Nilabhra 2024-05-14 10:14:18 +0400
  • 22c005fac3 add: support for falcon-10B architecture. Nilabhra 2024-04-15 13:52:20 +0400
  • 80ba799c88 Granite support? (#1882) Nicolas Patry 2024-05-13 13:46:29 +0200
  • fffd569fa6 Refactor layers. (#1866) Nicolas Patry 2024-05-13 12:44:30 +0200
  • 63fe93cfe9 update xpu docker image and use public ipex whel (#1860) Wang, Yi 2024-05-06 22:05:43 +0800
  • 0c03bd8181 Upgrading to rust 1.78. (#1851) Nicolas Patry 2024-05-06 13:48:11 +0200
  • 8fa2a8699b Add router name to /info endpoint (#1854) Lucain 2024-05-03 16:39:04 +0200
  • ee7c660412 Updating Phi3 (long context). (#1849) Nicolas Patry 2024-05-02 19:07:10 +0200
  • 731303af53 feat: prefer huggingface_hub in docs and show image api (#1844) drbh 2024-05-02 10:56:24 -0400
  • dfafda53e7 Remove misleading warning (not that important nowadays anyway). (#1848) Nicolas Patry 2024-05-02 15:09:46 +0200
  • ea17ce798f Adding scripts to prepare load data. (#1841) Nicolas Patry 2024-05-01 21:48:06 +0200
  • 33c6bb480d Fix: "Fixing" double BOS for mistral too. (#1843) Nicolas Patry 2024-05-01 18:21:17 +0200
  • 0c3f5de379 fix: split docs and start conceptual page (#1836) drbh 2024-05-01 03:03:25 -0400
  • 7ba395ab39 (chore): torch 2.3.0 (#1833) Nicolas Patry 2024-04-30 18:15:35 +0200
  • a9043412cd chore: update torch (#1730) OlivierDehaene 2024-04-30 14:04:28 +0200
  • 10828cb8ba Handle images in chat api (#1828) drbh 2024-04-30 06:18:32 -0400
  • 66ed33bce5 feat: add vlm docs and simple examples (#1812) drbh 2024-04-30 06:14:39 -0400
  • bce2c31f67 Fixing frequency penalty (#1811) Martin Iglesias Goyanes 2024-04-30 12:13:23 +0200
  • 7f29f1c97a feat: add how it works section (#1773) drbh 2024-04-30 05:45:49 -0400
  • a37e0ad19e fix: use get_speculate to the number of layers (#1737) OlivierDehaene 2024-04-30 11:45:26 +0200
  • 622aeda868 Add reference to TPU support (#1760) Brandon Royal 2024-04-30 05:39:52 -0400
  • 38b1753f6c Small CI cleanup. (#1801) Nicolas Patry 2024-04-30 11:39:38 +0200
  • f1c704d2f2 Add the missing tool_prompt parameter to Python client (#1825) Maziyar Panahi 2024-04-30 11:07:17 +0200
  • 3c126d2888 Prepare release. Nicolas Patry 2024-04-30 10:52:37 +0200
  • d4519fc413 Better graceful shutdown. (#1827) Nicolas Patry 2024-04-29 17:23:40 +0200
  • 0c926eaf5e Changing the waiting_served_ratio default (stack more aggressively by default). (#1820) Nicolas Patry 2024-04-28 17:54:19 +0200
  • 9848eb48b2 Dummy CI run. (#1817) Nicolas Patry 2024-04-26 19:19:55 +0200
  • 2bb20a081f Fixing qwen2. (#1818) Nicolas Patry 2024-04-26 19:19:08 +0200
  • 391658e546 Blunder (#1815) Nicolas Patry 2024-04-26 15:51:09 +0200
  • 83cda096ed add intel xpu support for TGI (#1475) Wang, Yi 2024-04-26 21:48:58 +0800
  • 959b026f45 Adding new env variables for TPU backends. (#1755) Nicolas Patry 2024-04-26 15:44:44 +0200
  • e38f89491f 2nd round of benchmark modifications (tiny adjustements to avoid overloading the host). (#1816) Nicolas Patry 2024-04-26 15:39:00 +0200
  • 812e64b763 Use the generation config. (#1808) Nicolas Patry 2024-04-25 19:41:50 +0200
  • 07b6014cd1 Update guidance docs to reflect grammar support in API (#1775) dr3s 2024-04-25 13:11:26 -0400
  • b76ba7979f Updating the benchmarks so everyone uses openai compat layer. (#1800) Nicolas Patry 2024-04-25 15:42:17 +0200
  • 82aa5ebf0f feat: improve temperature logic in chat (#1749) drbh 2024-04-25 09:31:35 -0400
  • e6e5a0ae94 Adding support for HF_HUB_OFFLINE support in the router. (#1789) Nicolas Patry 2024-04-23 23:38:30 +0200
  • c4f1f2ba19 fix: avoid frequency and repetition penalty on padding tokens (#1765) drbh 2024-04-23 17:19:16 -0400
  • 42769d97ca Idefics2. (#1756) Nicolas Patry 2024-04-23 23:04:44 +0200
  • d852d776d2 Phi3 support (#1797) Nicolas Patry 2024-04-23 18:40:05 +0200
  • 19820b7ac2 feat: allow null eos and bos tokens in config (#1791) drbh 2024-04-23 10:26:54 -0400
  • 9277208a5f Add attribute descriptions for GenerateParameters (#1798) Lucain 2024-04-23 16:22:12 +0200
  • ea62840147 fix typos in docs and add small clarifications (#1790) Moritz Laurer 2024-04-22 18:15:48 +0200
  • 8a92aeb322 Make --cuda-graphs work as expected (bis) (#1768) fxmarty 2024-04-22 16:09:19 +0200
  • 49e9537abe v2.0.1 OlivierDehaene 2024-04-18 17:20:36 +0200
  • fd13263e03 Upgrading all versions. (#1759) Nicolas Patry 2024-04-18 17:17:40 +0200
  • b53199fc23 feat: accept list as prompt and use first string (#1702) drbh 2024-04-17 04:41:12 -0400
  • 280b758eca fix: bump clients test base url to llama (#1751) drbh 2024-04-16 16:56:47 -0400
  • b153bba455 Update response type for /v1/chat/completions and /v1/completions (#1747) Lucain 2024-04-16 19:26:32 +0200
  • a0a8f30a22 feat: improve tools to include name and add tests (#1693) drbh 2024-04-16 09:02:46 -0400
  • dadaed33f0 Fixing CI. (#1748) Nicolas Patry 2024-04-15 18:47:36 +0200
  • 90885de12f v2.0.0 (#1736) OlivierDehaene 2024-04-12 18:38:34 +0200
  • 0096deee7a Fix typo in guidance.md (#1735) Ikko Eltociear Ashimine 2024-04-12 23:51:07 +0900
  • 4f28b4036e feat: medusa v2 (#1734) OlivierDehaene 2024-04-12 16:24:45 +0200
  • 99ad49aef3 Improve the defaults for the launcher (#1727) Nicolas Patry 2024-04-12 14:20:31 +0200
  • c27c838f57 chore(cargo-toml): apply lto fat and codegen-units of one (#1651) Christof Weickhardt 2024-04-12 12:34:13 +0200
  • 404f334600 fix(router): fix a possible deadlock in next_batch (#1731) OlivierDehaene 2024-04-12 10:59:04 +0200
  • 155372a39e Upgrade EETQ (Fixes the cuda graphs). (#1729) Nicolas Patry 2024-04-12 08:15:28 +0200
  • 72c52421d5 Fp8 Support (#1726) Nicolas Patry 2024-04-12 08:13:30 +0200
  • c1c81155e7 Dev/mask ldconfig output v2 (#1716) oOraph 2024-04-11 19:31:48 +0200
  • 00dc371f47 Revert "Easier defaults for models stemmed from configs." Nicolas Patry 2024-04-11 12:51:57 +0000
  • bc7a9d609a Easier defaults for models stemmed from configs. Nicolas Patry 2024-04-11 12:48:39 +0000
  • 4e855d84a3 Update libraries (#1713) abhishek thakur 2024-04-11 10:37:35 +0200
  • 7706d4c0e8 hotfix: mixtral OlivierDehaene 2024-04-10 18:38:08 +0200
  • dbde165b16 fix: fix CohereForAI/c4ai-command-r-plus (#1707) OlivierDehaene 2024-04-10 17:20:25 +0200
  • fe3586a902 Adding Llava-Next (Llava 1.6) with full support. (#1709) Nicolas Patry 2024-04-09 21:32:00 +0200
  • acc995c6fa Automatic quantization config. (#1719) Nicolas Patry 2024-04-09 10:27:57 +0200
  • 5e243eb222 Revert license to Apache 2.0 (#1714) OlivierDehaene 2024-04-08 15:06:16 +0200
  • 435a662ed4 Regenerate ld.so.cache (#1708) oOraph 2024-04-08 08:52:10 +0200
  • 8b817ca009 Force weights_only (before fully breaking pickle files anyway). (#1710) Nicolas Patry 2024-04-05 19:23:57 +0200
  • ba9cf1e51a Fixing cohere tokenizer. (#1697) Nicolas Patry 2024-04-05 16:44:19 +0200
  • 577e0707f7 Push users to streaming in the readme. (#1698) Nicolas Patry 2024-04-05 16:44:10 +0200
  • 66c9ab8373 Pickle conversion now requires --trust-remote-code. (#1704) Nicolas Patry 2024-04-05 13:32:53 +0200
  • 321359dc61 Add cuda graphs sizes and make it default. (#1703) Nicolas Patry 2024-04-04 23:01:56 +0200
  • a2029a57ab v1.4.5 (#1686) OlivierDehaene 2024-03-29 19:17:24 +0100
  • 14ed5a78d0 feat: Add dbrx support (#1685) OlivierDehaene 2024-03-29 18:49:36 +0100
  • 6436f83a95 change ToolCall id to string Bao Phan 2024-05-14 13:18:28 +0700
  • a24bf62368 fix: setting the rotary base from the config for the grouped query models. Nilabhra 2024-05-14 10:14:18 +0400
  • b789b0b67c Support openai tool_call_id request Bao Phan 2024-05-14 09:45:56 +0700
  • 6009dadee3 Model_type location. Nicolas Patry 2024-05-13 14:13:07 +0000
  • aceb87cc15 Remove old code again. Nicolas Patry 2024-05-13 13:26:52 +0000
  • 027e1dabcd Backport changes in medusa. Nicolas Patry 2024-05-13 13:18:29 +0000
  • de11fc064a Remove traces of use_medusa. Nicolas Patry 2024-05-13 13:12:44 +0000
  • 3397b26341 Missing update after rebase Nicolas Patry 2024-05-13 13:09:22 +0000
  • 71a535e401 Rebase after refactor. Nicolas Patry 2024-05-13 12:44:06 +0000
  • b884899086 Removed a bunch of hardcodes. Nicolas Patry 2024-05-08 12:20:00 +0000
  • 1a8a18d541 Cleanup. Nicolas Patry 2024-05-08 06:33:13 +0000
  • 1fde6850bb Fixed speculator. Nicolas Patry 2024-05-08 06:31:40 +0000
  • 9291d42865 [REWRITTEN] added a bunch of cleanup based on comments in PR; removed conditionals from LayerNormParameterized and renamed to MLPSpeculatorLayerNorm; now using modules for tensor-parallel (this is work in progress - looking into if this is right approach); fixed issue with getting medusa model; fixed for more efficient loading Joshua Rosenkranz 2024-05-03 10:02:11 -0400
  • 38d6045443 Hardcode a few stuff to make it work. Nicolas Patry 2024-05-06 14:03:05 +0000
  • 453e91f755 added a bunch of cleanup based on comments in PR; removed conditionals from LayerNormParameterized and renamed to MLPSpeculatorLayerNorm; now using modules for tensor-parallel (this is work in progress - looking into if this is right approach); fixed issue with getting medusa model; fixed for more efficient loading Joshua Rosenkranz 2024-05-03 10:02:11 -0400
  • 6e5c19ec44 initial commit of mlp_speculator support (draft) Joshua Rosenkranz 2024-05-02 10:18:42 -0400