Commit Graph

  • ab34c16610 Fix AMD documentation (#1307) fxmarty 2023-12-04 14:09:51 +0100
  • 91c653bac2 fix: default use_qk_norm false in cohere drbh 2024-04-17 20:59:16 +0000
  • 7ecda467c2 fix: make logic more readable drbh 2024-04-17 20:20:35 +0000
  • 9906b03b60 fix: update conditional to be more specific drbh 2024-04-17 14:59:34 +0000
  • a2c935d5fb fix: simplify changes drbh 2024-04-17 14:11:16 +0000
  • 4ea2a98d34 fix: adjust req typo drbh 2024-04-16 17:21:45 +0000
  • 6b9a25782d fix: add missing comma typo drbh 2024-04-16 12:42:47 -0400
  • 9387b3b793 fix: adjust conditional after rebase drbh 2024-04-16 12:36:59 -0400
  • 7879365f96 fix: revise temp scaling logic drbh 2024-04-15 20:13:40 +0000
  • 24a5588735 fix: reduce and refactor changes drbh 2024-04-15 19:39:48 +0000
  • 27cd254b89 fix: update temperature and sampling logic in chat drbh 2024-04-15 19:13:18 +0000
  • 0520bde039 feat: support do_sample param in ChatRequest drbh 2024-04-15 17:06:53 +0000
  • 5bc3d65dd3 Adding new env variables for TPU backends. Nicolas Patry 2024-04-17 10:00:58 +0000
  • 06c3d4b1ec
    feat: accept list as prompt and use first string (#1702) drbh 2024-04-17 04:41:12 -0400
  • 6655717e19 add ascend npu support for TGI statelesshz 2024-04-14 16:11:10 +0800
  • a7bf3196d4 feat: improve logs by passing span to internal functions drbh 2024-04-17 02:44:47 +0000
  • bd28c36815 feat: emit params in logs for each request drbh 2024-04-17 01:18:09 +0000
  • 593c443b45 fix: adjust rebase removals drbh 2024-04-16 22:42:58 +0000
  • 4fec982325 fix: adjust naming and tests and rebase typo drbh 2024-04-16 16:23:39 +0000
  • ef2363cd3a fix: refactor tests to support completions snapshot drbh 2024-04-16 15:53:27 +0000
  • 0b82080849 fix: adjust assert typo drbh 2024-04-16 14:52:39 +0000
  • f2080c4114 fix: graceful stream close and fix tests drbh 2024-04-16 14:49:19 +0000
  • c7b4cd318f fix: update tests for new behavior drbh 2024-04-11 22:46:39 +0000
  • a62e30462b fix: improve header init and error handling drbh 2024-04-11 21:18:14 +0000
  • 25f5e788ae fix: doc tweak drbh 2024-04-11 20:44:17 +0000
  • 908acc55b8 fix: decrease default batch, refactors and include index in batch drbh 2024-04-11 20:37:35 +0000
  • 16be5a14b3 feat: interleave streams and improve tests drbh 2024-04-11 19:26:20 +0000
  • 942e002674 fix: improve headers and add streaming test drbh 2024-04-10 15:22:23 +0000
  • 57606c447b feat: handle batch completions requests drbh 2024-04-10 04:07:03 +0000
  • c1afdcc1cb feat: better error if array len >=1 drbh 2024-04-09 00:21:31 +0000
  • 424b24f5fa feat: accept list as prompt and use first string drbh 2024-04-03 20:07:02 +0000
  • 9eeda34427 feat: vendor precompiled llama mlp kernel llama-fused-compiled-mlp drbh 2024-04-16 22:07:00 +0000
  • e4d31a40db
    fix: bump clients test base url to llama (#1751) drbh 2024-04-16 16:56:47 -0400
  • 00f365353e
    Update response type for /v1/chat/completions and /v1/completions (#1747) Lucain 2024-04-16 19:26:32 +0200
  • 2daca36d75 fix: bump clients test base url to llama drbh 2024-04-16 16:58:15 +0000
  • 7276d43495
    feat: improve tools to include name and add tests (#1693) drbh 2024-04-16 09:02:46 -0400
  • 88702d8763
    Fixing CI. (#1748) Nicolas Patry 2024-04-15 18:47:36 +0200
  • 50bc920463 Fixing CI. Nicolas Patry 2024-04-15 18:45:45 +0200
  • 7d6216d63b
    Update response type for /v1/chat/completions and /v1/completions Lucain 2024-04-15 16:57:16 +0200
  • f2b3d8d7ed add: support for falcon-10B architecture. Nilabhra 2024-04-15 13:52:20 +0400
  • c07f54aac2
    fix: fp8 dimensions size Dong Shin 2024-04-13 17:27:13 +0900
  • 238d2fefab fix: skip grammar tests since they timeout drbh 2024-04-12 21:50:38 +0000
  • 8ebb560f2f feat: integrate triton compilations demo explore-static-triton-kernels drbh 2024-04-12 21:47:15 +0000
  • eba4dd85aa fix: avoid long running tool test drbh 2024-04-12 16:39:23 +0000
  • 39a8261132 fix: split chat and tool tests drbh 2024-04-11 22:35:44 +0000
  • 5e888c4faa fix: readd infer changes and update tests drbh 2024-04-11 16:34:18 +0000
  • cc67f47d6e feat: improve chat_template to include tools drbh 2024-04-10 22:49:10 +0000
  • 9874b15fa8 fix: adjust tool grammar ownership drbh 2024-04-09 00:37:05 +0000
  • bb73acc1a9 feat: update default prompt and other small refactors drbh 2024-04-04 01:10:06 +0000
  • 106c9ce8e5 fix: avoid old change drbh 2024-04-03 20:18:06 +0000
  • 4930de857d feat: improve grammar to include name and add tests drbh 2024-04-02 01:28:21 +0000
  • c38a7d7ddd
    v2.0.0 (#1736) v2.0.0 OlivierDehaene 2024-04-12 18:38:34 +0200
  • ff2dfdc23f v2.0.0 OlivierDehaene 2024-04-12 17:14:48 +0200
  • 275caa04b1
    Fix typo in guidance.md (#1735) Ikko Eltociear Ashimine 2024-04-12 23:51:07 +0900
  • 621d92dbfe
    Fix typo in guidance.md Ikko Eltociear Ashimine 2024-04-12 23:46:43 +0900
  • eefea5ee31
    feat: medusa v2 (#1734) OlivierDehaene 2024-04-12 16:24:45 +0200
  • 0dd617b822 remove movedim OlivierDehaene 2024-04-12 16:23:54 +0200
  • 68717f8716 swap load OlivierDehaene 2024-04-12 16:11:47 +0200
  • 308d7bcb3d feat: medusa v2 OlivierDehaene 2024-04-12 15:41:50 +0200
  • 1b2670c823
    Improve the defaults for the launcher (#1727) Nicolas Patry 2024-04-12 14:20:31 +0200
  • f66c9f340b Update the doc. improve_defaults Nicolas Patry 2024-04-12 12:09:23 +0000
  • b75bd5b720
    Update launcher/src/main.rs Nicolas Patry 2024-04-12 14:08:38 +0200
  • 9d8f21cace
    chore(cargo-toml): apply lto fat and codegen-units of one (#1651) Christof Weickhardt 2024-04-12 12:34:13 +0200
  • 16386b83e1 Forgot the doc again. Nicolas Patry 2024-04-12 10:28:49 +0000
  • e5955851b9 Smaller default for max_input_length. Nicolas Patry 2024-04-12 10:22:02 +0000
  • 10dd0150c0 Dummy fix for medusa. tmp_medusa Nicolas Patry 2024-04-12 10:12:09 +0000
  • c2c98725f8
    fix(router): fix a possible deadlock in next_batch (#1731) OlivierDehaene 2024-04-12 10:59:04 +0200
  • 1e5150f475 "Fixing t5" just use more RAM for this test. Nicolas Patry 2024-04-12 08:46:33 +0000
  • cd07211411 Max_seq_len (old mpt config.) Nicolas Patry 2024-04-12 08:39:21 +0000
  • df54d2427a fix(router): fix a possible deadlock in next_batch OlivierDehaene 2024-04-12 10:30:27 +0200
  • c4ebcea79c Fixing default for BNB + cuda graphs (they don't work together). Nicolas Patry 2024-04-12 08:24:08 +0000
  • 289b0721c4 Adding some wiggle room. Nicolas Patry 2024-04-12 07:22:26 +0000
  • 9176ecbcea Remove the override ? Nicolas Patry 2024-04-12 06:52:45 +0000
  • 179ee4e2c2 Change things around when we don't have a tokenizer. Nicolas Patry 2024-04-11 18:44:23 +0000
  • d43e10e097 Making things work most of the time. Nicolas Patry 2024-04-11 18:30:38 +0000
  • 9ce9f39dea No unwrap. Nicolas Patry 2024-04-11 17:23:08 +0000
  • a4c86e8678 Update default doc. Nicolas Patry 2024-04-11 15:45:45 +0000
  • bd01d448d7 Better defaults (and LOG_COLORIZE). Nicolas Patry 2024-04-11 15:35:48 +0000
  • 3c71d2f1c4 Easier defaults for models stemmed from configs. Nicolas Patry 2024-04-11 12:48:39 +0000
  • 6c2c44b84c
    Upgrade EETQ (Fixes the cuda graphs). (#1729) Nicolas Patry 2024-04-12 08:15:28 +0200
  • 408dbc485c
    Fp8 Support (#1726) Nicolas Patry 2024-04-12 08:13:30 +0200
  • 666cde0e12
    Update server/text_generation_server/utils/layers.py Nicolas Patry 2024-04-12 08:12:32 +0200
  • 5ef2a48fec
    Update server/text_generation_server/utils/layers.py Nicolas Patry 2024-04-12 08:11:18 +0200
  • 9215b8b60b Upgrade EETQ (Fixes the cuda graphs). Nicolas Patry 2024-04-11 18:45:36 +0000
  • c2fd35d875
    Dev/mask ldconfig output v2 (#1716) oOraph 2024-04-11 19:31:48 +0200
  • 842f6658e2 Revert "Easier defaults for models stemmed from configs." Nicolas Patry 2024-04-11 12:51:57 +0000
  • b83aab9bb3 Easier defaults for models stemmed from configs. improve_launcher_defaults Nicolas Patry 2024-04-11 12:48:39 +0000
  • a352563ee0 Style. Nicolas Patry 2024-04-11 11:34:25 +0000
  • 66195d832c
    Update docs/source/basic_tutorials/launcher.md Nicolas Patry 2024-04-11 13:15:15 +0200
  • b24bdb9f8c Fp8 support. Nicolas Patry 2024-04-11 11:09:13 +0000
  • c31cb32dd6 Update docs2. Nicolas Patry 2024-02-01 15:51:15 +0000
  • 8cd198a6a1 Forgot to update docs. Nicolas Patry 2024-02-01 15:36:21 +0000
  • eb40f8ccda Marking the flag as really not the fastest and BETA. Nicolas Patry 2024-02-01 15:30:02 +0000
  • 6568e4812f typo removal. Nicolas Patry 2024-01-25 15:31:39 +0100
  • be59a6bc01 Updating docs. Nicolas Patry 2024-01-25 14:48:54 +0100
  • e1e9a18433 Dummy but working version. Nicolas Patry 2024-01-25 13:03:37 +0100
  • 50d5a3c11e Initial fp8. Nicolas Patry 2024-01-25 11:10:22 +0000
  • 10d9083b2d
    Update libraries (#1713) abhishek thakur 2024-04-11 10:37:35 +0200
  • 30620a9a44 hotfix: mixtral OlivierDehaene 2024-04-10 18:38:08 +0200
  • ad9d6288c8
    fix: fix CohereForAI/c4ai-command-r-plus (#1707) OlivierDehaene 2024-04-10 17:20:25 +0200