Commit Graph

  • f428f5fc8a updated release version to 2.0.6 Thanaji 2024-10-31 23:54:34 +0200
  • 3bafa0eb7b
    Fix prefix caching + speculative decoding Travis Addair 2024-10-31 13:43:49 -0700
  • 8d84ffabf2
    Upgrade to SynapseAI 1.18 (#227) yuanwu2017 2024-11-01 03:14:44 +0800
  • bfa16a5857 fix: adjust sharding and lm head logic David Holtz 2024-10-31 15:33:36 +0000
  • f1942b47f5 fix: add chat_tokenize endpoint to api docs drbh 2024-10-31 10:51:29 -0400
  • 3836c3fe72 fix qwen2 failure in intel cpu Wang, Yi A 2024-10-30 22:29:05 -0700
  • 7d97ee82a1 fix: only check model type if config exists David Holtz 2024-10-30 22:23:08 +0000
  • 8648212c76 feat: support multidimensional position ids on batch to enable cuda graphs on qwen2-vl David Holtz 2024-10-30 18:46:05 +0000
  • befd9f6735
    Support qwen2 vl (#2689) drbh 2024-10-30 12:40:51 -0400
  • 46aeb0860d
    add xpu triton in dockerfile, or will show "Could not import Flash At… (#2702) Wang, Yi 2024-10-30 21:18:50 +0800
  • d9a8bbc183 add ipex moe implementation to support Mixtral and PhiMoe Wang, Yi A 2024-10-29 19:18:53 -0700
  • 7fb4af9a87
    updated supported models list table in readme (#241) Thanaji Rao Thakkalapelli 2024-10-29 23:28:45 -0700
  • e137d4a9be updated supported models list table in readme Thanaji 2024-10-30 00:38:39 +0200
  • 620769e380 fix: avoid qwen2 vl specific paths with qwen2 David Holtz 2024-10-29 17:49:50 +0000
  • f9f34a5e20 fix: remove mostly mocked unit test drbh 2024-10-29 13:10:22 -0400
  • daf7d979d0 fix: remove trailing space lint after rebase drbh 2024-10-29 11:49:42 -0400
  • 77eb07f73b fix: adjust resize case for qwen2_vl warmup David Holtz 2024-10-29 15:47:32 +0000
  • 4f90db47be fix: adjust get_position_ids if not available and add required args to signatures David Holtz 2024-10-29 15:26:41 +0000
  • c1eab6cbb3 fix: adjust default when json tool choice is David Holtz 2024-10-20 21:57:47 +0000
  • 905d503971 fix: adjust tool choice type in test David Holtz 2024-10-18 16:11:58 +0000
  • 1ce1cf2862 fix: add missing snapshot file David Holtz 2024-10-18 15:39:05 +0000
  • 407531708e fix: adjust tool choice none logic, add test and small refactors David Holtz 2024-10-18 15:36:40 +0000
  • b5bf5b32ad fix: simplify naming, tool choice default and improve test David Holtz 2024-10-16 13:49:46 +0000
  • dd759e7914 feat: update docs and add tool choice configuration section David Holtz 2024-10-15 17:13:16 +0000
  • daa1c6280a fix: refactor away prepare_chat_input and improve tool grammar apply control flow David Holtz 2024-10-15 15:00:24 +0000
  • b2db1075e4 fix: simplify tool choice logic, improve tests, openapi and rust docs David Holtz 2024-10-15 14:01:02 +0000
  • f53c8059e9 feat: improve, simplify and rename tool choice struct add required support and refactor David Holtz 2024-10-14 17:40:45 +0000
  • 209f841767 fix: consolidate changes and remove old tool type David Holtz 2024-10-14 16:44:54 +0000
  • 2c172a2da7 fix: run linter and bump api docs David Holtz 2024-10-14 14:26:45 +0000
  • 151f950eea add tests Linus Bierhoff 2024-10-10 19:41:51 +0200
  • f979ff1965 add OpenAI like tool_choice for named choice Linus Bierhoff 2024-10-10 18:50:32 +0200
  • 77c81a29cb fix: prefer position_ids passed from vlm causal lm and reset ids on batch David Holtz 2024-10-29 01:13:17 +0000
  • fb1ae6d24c feat: refactors and calc num features David Holtz 2024-10-28 16:57:35 +0000
  • 831a07f990 fix: remove unused rotate_half David Holtz 2024-10-28 16:33:07 +0000
  • f2a1b1b3fc fix: adjust for ruff lints drbh 2024-10-28 12:30:03 -0400
  • 6208d10c53 fix: format model file David Holtz 2024-10-28 15:24:32 +0000
  • 65558b32f4 fix: add norm after text output David Holtz 2024-10-28 15:14:02 +0000
  • aa2aa9f915 fix: include linted file David Holtz 2024-10-28 14:37:59 +0000
  • 670d75b872 fix: update docs and lint unused vars David Holtz 2024-10-28 14:36:54 +0000
  • 279b114ab3 fix: adjust positional embeddings for multi dimensional position ids David Holtz 2024-10-28 14:06:18 +0000
  • e1114c2726 fix: lint test David Holtz 2024-10-28 03:07:15 +0000
  • 80ea4f0610 feat: add simple test chat with meesage and text David Holtz 2024-10-28 03:06:35 +0000
  • ec933282b2 fix: remove get_cos_sin_hack dev function David Holtz 2024-10-28 02:20:00 +0000
  • 22fdf9344f fix: improve get_position_ids, add lift embed_tokens David Holtz 2024-10-28 02:15:48 +0000
  • 09ac4fb6eb feat: fix token padding, enable warmup and process basic request David Holtz 2024-10-24 19:57:47 +0000
  • d96eef2a02 feat: add support for qwen2 vl model David Holtz 2024-10-24 15:36:53 +0000
  • 3bb78a8266 misc(deps): update ompi from 4.1.6 to 4.1.7rc1 to avoid strange deadlock trtllm-stop-words Morgan Funtowicz 2024-10-28 17:24:08 +0100
  • 512225474a misc(deps): add pyo3 to dependencies Morgan Funtowicz 2024-10-28 17:23:32 +0100
  • 98330df65e
    Monkey patching as a desperate measure. (#2704) Nicolas Patry 2024-10-28 11:25:13 +0100
  • 79bfdc93f7
    New snapshot ? Nicolas Patry 2024-10-28 11:24:29 +0100
  • 7daf27ca58
    Monkey patching as a desperate measure. Nicolas Patry 2024-10-28 11:10:00 +0100
  • 489e5b0fbe add xpu triton in dockerfile, or will show "Could not import Flash Attention enabled models: No module named 'triton'" Wang, Yi A 2024-10-28 01:38:06 -0700
  • 513d19b955
    More timeout on docker start ? (#2701) Nicolas Patry 2024-10-28 08:57:22 +0100
  • 21f4126075
    Latest upgrade. Nicolas Patry 2024-10-28 08:35:41 +0100
  • 5bab8c9368
    More timeout on docker start ? Nicolas Patry 2024-10-28 08:30:45 +0100
  • 4c9856f9e5 Add missing package yuanwu 2024-10-28 07:04:56 +0000
  • 3a9cdc3241
    Fixing auto bloom test. (#2699) Nicolas Patry 2024-10-28 06:14:11 +0100
  • debd44b04e
    Fixing auto bloom test. Nicolas Patry 2024-10-28 06:13:32 +0100
  • 78ce618c70
    Update poetry lock. (#2698) Nicolas Patry 2024-10-28 06:11:33 +0100
  • 5a62dbf7c6
    Update poetry lock. Nicolas Patry 2024-10-28 05:12:09 +0100
  • 7bc2c97bd9
    Check if allowed tokens is None (#2694) upgrade-outlines Alex Weston 2024-10-28 00:10:55 -0400
  • 44a9b2510d
    Merge branch 'upgrade-outlines' into upgrade-outlines Nicolas Patry 2024-10-28 05:10:45 +0100
  • b49cff3f07
    Update for new API Nicolas Patry 2024-10-25 10:46:05 +0200
  • 0721649fc6
    Upgrade outlines to 0.1.1 Alex Weston 2024-10-16 13:58:54 -0400
  • 90b226db29
    We can have a tokenizer anywhere. (#2527) Nicolas Patry 2024-10-28 05:00:24 +0100
  • 0c9b6cdd76
    Choosing input/total tokens automatically based on available VRAM? (#2673) Nicolas Patry 2024-10-28 04:59:49 +0100
  • 2e4f4ba1bb
    Green main (#2697) Nicolas Patry 2024-10-28 04:59:32 +0100
  • 4581fe7cbc
    Green main Nicolas Patry 2024-10-28 11:05:15 +0800
  • c23584f626
    Merge branch 'habana-main' into 2.3.0 yuanwu2017 2024-10-28 04:37:07 +0800
  • 372e071135 Fix the issues of tgi-gaudi for v.2.3.1 yuanwu 2024-10-27 06:01:17 +0000
  • 7e282b4153 V2.3.1 Nicolas Patry 2024-10-03 14:49:40 +0200
  • 34e98b14ef New release 2.3.1 (#2604) Nicolas Patry 2024-10-03 14:43:49 +0200
  • 902f526d69 Unroll notify error into generate response (#2597) drbh 2024-10-02 11:34:57 -0400
  • 7664d2e2b3 CI (2592): Allow LoRA adapter revision in server launcher (#2602) drbh 2024-10-02 10:51:04 -0400
  • 967e67111d Max token capacity metric (#2595) Nicolas Patry 2024-10-02 16:32:36 +0200
  • 51506aa57a Mllama flash version (#2585) Nicolas Patry 2024-10-02 11:22:13 +0200
  • 8a8794a672
    Avoiding timeout for bloom tests. (#2693) Nicolas Patry 2024-10-26 05:35:28 +0200
  • 0a655a0ab5
    v2.4.0 v2.4.0 git_v2.4.0 OlivierDehaene 2024-10-25 23:12:49 +0200
  • a6b02da971
    chore: prepare 2.4.0 release (#2695) OlivierDehaene 2024-10-25 23:10:49 +0200
  • 6f88bd9390
    feat: add triton kernels to decrease latency of large batches (#2687) OlivierDehaene 2024-10-25 23:10:00 +0200
  • 9c7dedbb07
    chore: prepare 2.4.0 release OlivierDehaene 2024-10-25 22:41:24 +0200
  • 50b394d401
    add slots filtering kernel OlivierDehaene 2024-10-25 22:14:26 +0200
  • b4ebfa52f4
    fix speculation OlivierDehaene 2024-10-25 21:13:24 +0200
  • b04ffd48f1 Check if allowed tokens is None Alex Weston 2024-10-25 11:21:17 -0400
  • 0f346a3296
    Switch from fbgemm-gpu w8a8 scaled matmul to vLLM/marlin-kernels (#2688) Daniël de Kok 2024-10-25 16:40:47 +0200
  • c7f62d4302
    No early exit. Nicolas Patry 2024-10-25 16:15:05 +0200
  • 14e8ca5236
    Merge branch 'main' into feature/get-trace-id-from-req-headers Hyeongchan Kim 2024-10-25 20:37:36 +0900
  • 2f2594bafe
    Pulling ? Nicolas Patry 2024-10-25 11:46:11 +0200
  • 8bb48ea998 Update test snapshots Daniël de Kok 2024-10-25 09:45:44 +0000
  • 2b25e9a94e
    disable triton on rocm OlivierDehaene 2024-10-25 11:33:53 +0200
  • 3f60230928
    Fail early. Nicolas Patry 2024-10-25 11:23:23 +0200
  • fa964f82d3 nix: experimental support for building a Docker container (#2470) Daniël de Kok 2024-10-01 18:02:06 +0200
  • 775e5f4c64 MoE Marlin: support desc_act for groupsize != -1 (#2590) Daniël de Kok 2024-09-30 19:40:25 +0200
  • 692f8ddb69 Move flake back to tgi-nix main (#2586) Daniël de Kok 2024-09-30 11:39:41 +0200
  • bdc47394d2 feat: support phi3.5 moe (#2479) drbh 2024-09-30 11:15:09 +0200
  • 288bcb0027 Add support for GPTQ-quantized MoE models using MoE Marlin (#2557) Daniël de Kok 2024-09-30 11:14:32 +0200
  • ff905aeff3 Update ROCM libs and improvements (#2579) Mohit Sharma 2024-09-30 14:24:32 +0530
  • 6808b2de7e Update architecture.md (#2577) Ikram Ul Haq 2024-09-30 09:56:20 +0300
  • 55fd2816ea Remove compute capability lazy cell (#2580) Daniël de Kok 2024-09-30 08:48:47 +0200
  • f82a3f5816 flashinfer: pass window size and dtype (#2574) Daniël de Kok 2024-09-28 18:41:41 +0200