Commit Graph

  • 359eb9d478 update to metrics 0.23.0 or could work with metrics-exporter-prometheus 0.15.1 Wang, Yi A 2024-07-05 01:39:16 -0700
  • b67d46336e
    Fix Starcoder2 after refactor (#2189) Daniël de Kok 2024-07-05 12:22:45 +0200
  • 11efc40db3 Fix Starcoder2 after refactor Daniël de Kok 2024-07-05 12:17:01 +0200
  • 853d4eb9cf
    Hotfixing after refactor. Nicolas Patry 2024-07-05 09:25:29 +0000
  • fff1d4f86f
    Add bucket for input seq len exactly same as --max-input-length (#178) Sun Choi 2024-07-05 01:30:26 -0700
  • fb2f74e2b9
    Refactor dead code - Removing all flash_xxx.py files. (#2166) Nicolas Patry 2024-07-05 10:29:56 +0200
  • 1b4d80c03e
    Update docker image path in README (#181) Karol Damaszke 2024-07-05 10:29:02 +0200
  • c6bcadf883
    Adding "longrope" for Phi-3 (#2172) (#2179) Aaron Mihalik 2024-07-05 03:46:41 -0400
  • 25c9611c04
    Wrong default. Nicolas Patry 2024-07-04 17:18:26 +0200
  • 4aa0642f4d
    Default value for gemma/gemma2. Nicolas Patry 2024-07-04 17:17:46 +0200
  • c50397ca3a misc: update vllm dependecy to support attention size 160 Paolo Albano 2024-06-04 13:48:44 +0000
  • 425f348e48
    Add default to Gemma Causality. Nicolas Patry 2024-07-04 16:36:16 +0200
  • fc5bfa070a
    Fixing docs + causal.lm. Nicolas Patry 2024-07-04 13:37:22 +0200
  • 8ecee7283c
    Fixing docs + causal_lm batch_class. Nicolas Patry 2024-07-04 12:59:59 +0200
  • e2edf2beb2
    Finish removal. Nicolas Patry 2024-07-03 15:19:06 +0000
  • f5ff9b5742
    Fuse back mistral into FlashCausalLM. Nicolas Patry 2024-07-03 15:08:44 +0000
  • fbf38c997c
    Removing the dead code. Nicolas Patry 2024-07-03 13:34:37 +0000
  • 9cc58d1cb3
    Addresses comments. Nicolas Patry 2024-07-03 13:29:19 +0000
  • 2259d2f78a
    Stopping earlier because of <end_of_utterance> in idefics2. Nicolas Patry 2024-07-03 08:09:58 +0000
  • e8ff76fd18
    Fixing config.n_head. Nicolas Patry 2024-07-02 17:01:25 +0000
  • 24bbd7b822
    Removing more dead code. Nicolas Patry 2024-07-02 16:46:52 +0000
  • dbf9292afc
    Fixing santacoder (num_kv_heads hardcoded). Nicolas Patry 2024-07-02 16:35:08 +0000
  • 43ef5268fd
    Fixes for VLM. Nicolas Patry 2024-07-02 16:02:33 +0000
  • b2fb845923
    Fixing sharding. Nicolas Patry 2024-07-02 15:37:27 +0000
  • 298500a08e
    Fixing the simple tests. Nicolas Patry 2024-07-02 15:13:24 +0000
  • db9acc4418
    Fix Santacoder test. Nicolas Patry 2024-07-02 16:55:48 +0200
  • ce913b874b
    More cleanup. Nicolas Patry 2024-07-02 16:11:53 +0200
  • 7d96b1a103
    More dead code. Nicolas Patry 2024-07-02 14:45:35 +0200
  • ed34cf0222
    Remove a lot of duplicated code. Nicolas Patry 2024-07-02 14:17:57 +0200
  • 69cb084b5f
    First working step. Nicolas Patry 2024-07-02 11:25:18 +0000
  • b28946d695
    Refactor dead code. Nicolas Patry 2024-07-02 11:13:51 +0000
  • d282470a3d
    Fix Dockerfile path (#180) regisss 2024-07-04 16:25:49 +0200
  • 7c582f6bfc
    Add new workflow to push Docker images (#179) regisss 2024-07-04 15:32:33 +0200
  • 5df20f88ff
    Fix to non-LLAMA models (#177) Jacek Czaja 2024-07-04 13:42:24 +0200
  • 4dfdb481fb
    Version 2.1.1 v2.1.1 git_v2.1.1 Nicolas Patry 2024-07-04 12:39:07 +0200
  • 245d3de948
    Preparing patch release. (#2186) Nicolas Patry 2024-07-04 10:55:33 +0200
  • 16da963c51
    Preparing patch release. Nicolas Patry 2024-07-04 10:53:29 +0200
  • 9ea900de49 simplify initialize_torch_distributed() ur4t 2024-07-04 14:50:12 +0800
  • 72426dabb6 add doc for intel gpus Wang, Yi A 2024-07-03 17:31:34 -0700
  • 29c7cb36e5 Remembering to check how we can detect support for chunked context Morgan Funtowicz 2024-07-03 21:38:17 +0000
  • f57f2a4521 First version loading engines and making it ready for inference Morgan Funtowicz 2024-07-03 21:12:24 +0000
  • 694e4dd9f3
    Adding "longrope" for phi-3 Aaron Mihalik 2024-07-03 12:01:18 -0400
  • fdf3a58f5a
    fix: python deserialization Javier Martinez 2024-07-03 17:50:19 +0200
  • d36ab18006
    change to CPU Runners Guillaume LEGENDRE 2024-07-03 17:08:32 +0200
  • 1b158e3b0d
    change to S3 cache Guillaume LEGENDRE 2024-07-03 15:53:45 +0200
  • 6168aa4100
    Fix to perf regression caused by OH update (#164) Jacek Czaja 2024-07-03 15:20:45 +0200
  • c64b5b75e2
    [TORCH COMPILE] Ignore HPU GRAPHS env var when eager mode is used (#165) Jacek Czaja 2024-07-03 15:17:27 +0200
  • 19e2c3b3cb
    Update build.yaml Guillaume LEGENDRE 2024-07-03 15:16:48 +0200
  • ab281113e1
    Update build.yaml Guillaume LEGENDRE 2024-07-03 15:14:19 +0200
  • 5ad41aa2a6
    Fixing missing object field for regular completions. (#2175) Nicolas Patry 2024-07-03 12:56:27 +0200
  • d71a7dc18a
    Fixing docs by re-adding missing Prompt. Nicolas Patry 2024-07-03 10:55:30 +0000
  • 2b3bd1e008
    Fixing the dockerfile warnings. (#2173) Nicolas Patry 2024-07-03 12:48:45 +0200
  • af0f7ed405
    Fixing missing object field for regular completions. Nicolas Patry 2024-07-03 10:40:22 +0000
  • be4a4c47f9
    Revert "Fixing missing object field for regular completions." Nicolas Patry 2024-07-03 10:41:39 +0000
  • 2bbb7fa4b2
    Fixing missing object field for regular completions. Nicolas Patry 2024-07-03 10:40:22 +0000
  • 7c475b6226
    Update to SynapseAI 1.16.0 (#167) Karol Damaszke 2024-07-03 11:08:56 +0200
  • 535a35db17
    Set unique request id during warmup (#170) Karol Damaszke 2024-07-03 10:58:20 +0200
  • 4b4382c6f8
    Fix dtype mismatch in HeterogeneousFrequencyPenaltyLogitsProcessor (#163) Karol Damaszke 2024-07-03 10:57:41 +0200
  • 30342ca82d
    Fix Makefile commands (#161) Karol Damaszke 2024-07-03 10:57:09 +0200
  • 5e6ea1701b
    Fixing the dockerfile warnings. Nicolas Patry 2024-07-03 08:41:58 +0000
  • f8a1463915 Enable end to end CMake build Morgan Funtowicz 2024-07-03 10:27:53 +0200
  • 571530dd9a
    feat: improve update_docs for openapi schema (#2169) drbh 2024-07-03 03:53:35 -0400
  • fc2ce1cec1 fix: alllow trailing space in openapi schema diff drbh 2024-07-02 20:22:24 +0000
  • 2d0bd1ab6d fix: explicitly install protoc and python drbh 2024-07-02 18:58:14 +0000
  • 26eb7cea3a fix: adjust autodoc workflow drbh 2024-07-02 18:04:55 +0000
  • e4161a185f feat: improve update doc and add command to print router schema drbh 2024-07-02 18:01:29 +0000
  • fe3991e857 feat: add simple ttft load_test add-small-ttft-script drbh 2024-07-02 15:57:01 +0000
  • caa44012ad fix: install protoc before server drbh 2024-07-02 15:55:37 +0000
  • aaa899139b fix: adjust raise condition and install server in ci drbh 2024-07-02 15:42:20 +0000
  • a1c92bd0d6 fix: adjust timeout for CI drbh 2024-07-02 15:28:13 +0000
  • 007c8302e2 feat: improve workflow to check openapi schema too drbh 2024-07-02 15:24:14 +0000
  • bc74571cd8 fix: update workflow to use update_doc md command drbh 2024-07-02 15:22:13 +0000
  • ba87834f14 fix: adjust revert typo drbh 2024-07-02 15:19:04 +0000
  • f5f5c2363b fix: adjust typo drbh 2024-07-02 15:18:10 +0000
  • 36bd48c293 fix: prefer improved update_doc and start server and compare drbh 2024-07-02 15:17:43 +0000
  • 818162e0c2 Overall build TRTLLM and deps through CMake build system Morgan Funtowicz 2024-07-02 17:16:27 +0200
  • 7b34ba3408 feat: add pre commit step to force schema update when router changes drbh 2024-07-02 14:47:01 +0000
  • cb232a35a9 feat: add test to view batch speedup amount test-batch-speedup-amount drbh 2024-07-02 13:33:26 +0000
  • 29a416078c Merge branch 'main' into ci_amd3 fxmarty 2024-07-02 15:32:53 +0200
  • 0a5b19a3ed updated doc fp8_kvcache Mohit Sharma 2024-07-02 13:10:26 +0000
  • 6d6b0bdcc4 fix formatting Mohit Sharma 2024-07-02 13:08:56 +0000
  • add4d42cb3 do not use tunableop for non flash-causal-lm modezls Felix Marty 2024-07-02 12:52:55 +0000
  • f34560f74a updated docs Mohit Sharma 2024-07-02 12:50:39 +0000
  • 0759ec495e
    Hotfixing qwen2 and starcoder2 (which also get clamping). (#2167) Nicolas Patry 2024-07-02 14:26:47 +0200
  • 57541d5e88
    Hotfixing qwen2 and starcoder2 (which also get clamping). Nicolas Patry 2024-07-02 14:26:16 +0200
  • 963b6c6f0f
    Ci test (#2124) Guillaume LEGENDRE 2024-07-02 12:45:38 +0200
  • 84e66274de
    Update .github/workflows/ci_build.yaml Nicolas Patry 2024-07-02 12:45:26 +0200
  • b141e9ad6f Update specification Aymeric 2024-07-02 12:07:23 +0200
  • dea9c0dc74
    Fixing rocm. (#2164) add-chat-response-format Nicolas Patry 2024-07-02 12:01:08 +0200
  • a0e08f873d
    Fixing rocm. Nicolas Patry 2024-07-02 10:00:40 +0000
  • b966bc0d35
    fix: use the base layers weight in mistral rocm (#2155) drbh 2024-07-02 05:56:25 -0400
  • 5d97e0c4a3
    fix FlashDecoding change's regression in intel platform (#2161) Wang, Yi 2024-07-02 17:56:07 +0800
  • 022f6515a4
    Fixing graph capture for flash decoding. (#2163) Nicolas Patry 2024-07-02 11:43:07 +0200
  • ee6d09eea4 Fixing graph capture for flash decoding. Nicolas Patry 2024-07-02 09:41:29 +0000
  • a414e276a5 fix registry url Guillaume LEGENDRE 2024-06-28 12:28:46 +0200
  • 11bd513b23 Move cache to push registry Guillaume LEGENDRE 2024-06-28 11:39:51 +0200
  • 91baf507bb remove comments Guillaume LEGENDRE 2024-06-27 13:57:26 +0200
  • 91abcfe0a4 change push registry Guillaume LEGENDRE 2024-06-26 11:11:35 +0200
  • 2670d2d4e8 first test with registry mirror Guillaume LEGENDRE 2024-06-25 17:12:50 +0200
  • c2f4b7f93e add vicuna Felix Marty 2024-07-02 08:25:12 +0000