Commit Graph

  • 0a01dde986
    Trying to fix non chunking targets. Nicolas Patry 2024-10-23 15:02:52 +0800
  • 7f13fac132
    Create test_structured_output_response_format_llama_json.json Sidharth Rajaram 2024-10-22 20:10:52 -0700
  • 9a18b75971
    Create test_structured_output_response_format_llama.py Sidharth Rajaram 2024-10-22 20:07:21 -0700
  • a1803bb780
    Add json_schema as an alias for JSON Grammar Sidharth Rajaram 2024-10-22 19:08:32 -0700
  • 5f81550aa6 feat(docker): add python3.10 dev to runtime deps Morgan Funtowicz 2024-10-22 23:05:55 +0200
  • ba2618eba2 feat(docker): build with-slurm ompi Morgan Funtowicz 2024-10-22 23:05:45 +0200
  • 56106b4c27 chore(router): minor refactorings Morgan Funtowicz 2024-10-22 23:05:10 +0200
  • 9c9ef37c56
    Add impureWithCuda dev shell (#2677) Daniël de Kok 2024-10-22 11:02:55 +0200
  • 6d60aa97b9 Add cuDNN Daniël de Kok 2024-10-22 08:02:48 +0000
  • 7aa90c58dc Add impureWithCuda dev shell Daniël de Kok 2024-10-22 07:55:07 +0000
  • 84f3bf902a chore(trtllm): minor fix Morgan Funtowicz 2024-10-21 23:50:02 +0200
  • 47d8c53dda chore(trtllm): ensure max throughput scheduling policy is selected Morgan Funtowicz 2024-10-21 23:40:54 +0200
  • a6ac2741a3 chore(trtllm): validate there are enough GPus on the system for the desired model Morgan Funtowicz 2024-10-21 23:40:38 +0200
  • 848b8ad554 chore(trtllm): minor refactoring Morgan Funtowicz 2024-10-21 23:40:20 +0200
  • 60a08a283d chore(trtllm): use GetParallelConfig Morgan Funtowicz 2024-10-21 23:39:44 +0200
  • d5c8bdc53b chore(trtllm): define a macro for SizeType cast Morgan Funtowicz 2024-10-21 23:39:08 +0200
  • 7217cafadb chore(trtllm): create specific parallelconfig factory and logging init methods Morgan Funtowicz 2024-10-21 23:38:42 +0200
  • 421a17544e feat(trtllm): add stop words handling Morgan Funtowicz 2024-10-21 17:00:45 +0200
  • c1a43a6c3e chore(ffi):formatting Morgan Funtowicz 2024-10-21 16:59:30 +0200
  • 9ac26ed717 feat(post_processing): max_new_tokens is const evaluated now Morgan Funtowicz 2024-10-21 16:57:46 +0200
  • cdac4b0058 chore(looper): cleanup a bit more Morgan Funtowicz 2024-10-21 16:57:26 +0200
  • 04c6f51258 feat(trtllm): rewrite health to not account for current state Morgan Funtowicz 2024-10-21 15:55:38 +0200
  • 18b473b019 chore(router): add python dependency Morgan Funtowicz 2024-10-22 09:51:50 +0200
  • d73401ac73 chore(rebase): fix invalid references Morgan Funtowicz 2024-10-21 21:44:28 +0200
  • f5b9ee368a Revert "chore(trtllm): remove unused method" Morgan Funtowicz 2024-10-21 17:03:35 +0200
  • a31db04709
    Remove generated files. Nicolas Patry 2024-10-21 15:24:38 +0200
  • 058d3061f7
    break when there's nothing to read (#2582) Wang, Yi 2024-10-21 21:22:48 +0800
  • 8d1c3c8ad4 feat(trtllm): do not tokenize twice Morgan Funtowicz 2024-10-21 15:06:54 +0200
  • 79469f5f39
    Update doc. Nicolas Patry 2024-10-21 14:57:24 +0200
  • a1aac7843b
    Choosing input/total tokens automatically based on available VRAM? Nicolas Patry 2024-10-21 13:02:04 +0200
  • 1a3da05f34 misc(router): remove SchedulingError Morgan Funtowicz 2024-10-21 14:57:19 +0200
  • e6da212431 feat(trtllm): cache maxNumTokens to avoid calling JSON everytime Morgan Funtowicz 2024-10-21 14:51:58 +0200
  • fe8d55dba9
    Clean both threads. close_dl_thread Nicolas Patry 2024-10-21 14:49:07 +0200
  • 009c4e0b94
    Fixing performance degradation on Intel. Nicolas Patry 2024-10-21 14:45:19 +0200
  • 31747163e7 chore(trtllm): remove unused method Morgan Funtowicz 2024-10-21 14:10:23 +0200
  • 7f54b7336a
    Test Marlin MoE with desc_act=true (#2622) Daniël de Kok 2024-10-21 12:50:35 +0200
  • fb00f985ae chore(trtllm): post-rebase commit Morgan Funtowicz 2024-10-21 12:31:24 +0200
  • 85c03e33a9 chore(trtllm): fmt Morgan Funtowicz 2024-10-21 09:38:51 +0200
  • e3bce407be chore(trtllm): disable tokenizer parallelism by default Morgan Funtowicz 2024-10-21 09:25:31 +0200
  • 62f33d7ecd chore(trtllm): move dockerfile to right place Morgan Funtowicz 2024-10-21 09:25:13 +0200
  • 6687c06a21 feat(looper): minor optimizations to avoid growing too much the containers Morgan Funtowicz 2024-10-18 00:09:45 +0200
  • 027756c52d chore(cmake): download timestamp should be before URL Morgan Funtowicz 2024-10-18 00:07:53 +0200
  • 629153b44b feat(looper): check engine and executorWorker paths exist before creating the backend Morgan Funtowicz 2024-10-17 13:13:34 +0200
  • f20ec28891 chore(cmake): use correct policy for download_timestamp Morgan Funtowicz 2024-10-17 13:12:34 +0200
  • 819c953771 misc(cuda): require 12.6 Morgan Funtowicz 2024-10-17 13:12:16 +0200
  • dd94ccc989 (fix): ore fixes for Dockerfile Morgan Funtowicz 2024-10-10 16:24:38 +0000
  • f9f10a6636 (misc): improve trtllm download script robustness Morgan Funtowicz 2024-10-10 14:11:41 +0000
  • 0c3ba932cc (misc): disable logging in release mode Morgan Funtowicz 2024-10-10 14:11:25 +0000
  • 437c2aa142 (misc): update dependency in trtllm dockerfile Morgan Funtowicz 2024-10-10 13:10:05 +0000
  • cb69c9a967 (misc): update dependency in trtllm dockerfile Morgan Funtowicz 2024-10-10 12:48:55 +0000
  • c8a99af6c9 (fix): do not recreate the stateful hashmap at every it Morgan Funtowicz 2024-10-10 12:41:46 +0000
  • eb13d8d1f3 (misc): increase verbosity of spdlog Morgan Funtowicz 2024-10-10 12:41:20 +0000
  • ce0cd1fce8 (misc): build with trtllm 0.13.0 Morgan Funtowicz 2024-10-10 12:40:49 +0000
  • 188e4dc64f (misc: build for sm_{75,80,86,89,90} by default Morgan Funtowicz 2024-10-10 12:40:32 +0000
  • 544c9d9dba (fix): HOPPER_SM_MAJOR is 9 not 8 Morgan Funtowicz 2024-10-10 12:36:32 +0000
  • 213acc6e34 (misc) move to latest trtllm Morgan Funtowicz 2024-09-25 10:08:45 +0000
  • 507ff66692 (misc) rerun-if-changed all the cmake modules Morgan Funtowicz 2024-09-25 10:01:21 +0000
  • b242f45c04 (misc) delete backend.rs Morgan Funtowicz 2024-09-03 21:19:41 +0000
  • 984ae9798f (post) impl postprocessing Morgan Funtowicz 2024-08-26 14:28:44 +0000
  • fa63db0d07 (scheduler) rework submit/pull logic Morgan Funtowicz 2024-08-26 13:39:20 +0000
  • 42ccf4e77c (misc) no need to move for uint32_t items Morgan Funtowicz 2024-08-26 13:38:49 +0000
  • b41875c139 (misc) simplify [make_]move_iterator by using c++20 type inference Morgan Funtowicz 2024-08-26 08:24:38 +0000
  • 0f50539b77 (Dockerfile.trtllm) delete for now Morgan Funtowicz 2024-08-11 14:10:51 +0200
  • b1846fb4e6 (backend) refactor & cleanup Morgan Funtowicz 2024-08-11 14:10:28 +0200
  • 483f172938 (ffi) do not use reference capture in lambda as we are not capturing anything Morgan Funtowicz 2024-08-11 14:10:12 +0200
  • 3d0e90b631 (ffi) missing namespace for tle::Response Morgan Funtowicz 2024-08-10 00:21:18 +0200
  • 8e648ce425 (ffi) fix usage of wrong vector constructor making a capacity fill call Morgan Funtowicz 2024-08-09 22:45:18 +0200
  • dddc9a44bd (build) fetchcontent use archives instead of git Morgan Funtowicz 2024-08-08 09:44:15 +0200
  • 089c5fe668 (server) forward auth_token to server::run Morgan Funtowicz 2024-08-08 10:53:25 +0000
  • 291eaa99fb use blocking_recv in looper to consume awaiting_requests at max before pulling in a single step Morgan Funtowicz 2024-08-08 00:33:10 +0200
  • 7bebc629af (misc) missing Result types for Rust Morgan Funtowicz 2024-08-05 13:39:14 +0000
  • c2e21d8725 (backend) implement the post_processor background thread Morgan Funtowicz 2024-08-05 13:27:18 +0000
  • 0dca168bcb (misc) change scope identifiers Morgan Funtowicz 2024-08-05 11:41:29 +0000
  • 933ab67aa1 (ffi) encode the provided user prompt within each request thread Morgan Funtowicz 2024-08-05 07:56:14 +0000
  • 0b0c30fe8b (ffi) remove narrowing type warning Morgan Funtowicz 2024-08-03 21:55:04 +0000
  • fb759bdd2a (looper) new looper initial implementation Morgan Funtowicz 2024-08-02 22:18:39 +0000
  • 5f7c0b67c3 (ffi) add template specialization to catch and convert to Rust Result<T, tensorrt_llm::common::TllmException> Morgan Funtowicz 2024-08-02 22:18:18 +0000
  • 33c962ef41 (ffi) add missing headers imports Morgan Funtowicz 2024-08-02 22:17:32 +0000
  • 2883c042ed (ffi) cleanup again Morgan Funtowicz 2024-08-02 22:17:02 +0000
  • f4a74be384 (backend) expose PullNewTokens Morgan Funtowicz 2024-08-02 22:16:28 +0000
  • b8a40a0af3 (backend) cleanup a bit Morgan Funtowicz 2024-08-02 22:14:03 +0000
  • 38b5263c61 (ffi) add max_new_tokens parameters Morgan Funtowicz 2024-08-02 22:11:41 +0000
  • f6f689f509 (build) setup ccache if available Morgan Funtowicz 2024-08-02 22:10:01 +0000
  • 2a339f99dd (trt) Morgan Funtowicz 2024-08-02 22:09:12 +0000
  • 169e1f452f (server) expose new SchedulingError Morgan Funtowicz 2024-08-01 11:59:14 +0000
  • 0cd7538a48 (ffi) use const for GetSamplingConfig Morgan Funtowicz 2024-08-01 07:49:37 +0000
  • cea64e234f (chore) fmt ... why? Morgan Funtowicz 2024-07-31 20:38:30 +0000
  • a3f7d76f7b (launcher) default new server::run parameters to false for now Morgan Funtowicz 2024-07-31 09:06:52 +0000
  • 25b20cba2a (backend) use parking_lot crate for RwLock fairness Morgan Funtowicz 2024-07-31 12:30:53 +0000
  • 5e0fb46821
    Make handling of FP8 scales more consisent (#2666) Daniël de Kok 2024-10-19 09:05:01 +0200
  • c5e3881051
    Enables Flash Attention in TGI for gemma models (#235) Thanaji Rao Thakkalapelli 2024-10-18 09:20:42 -0700
  • 9ae5ad5057
    requirements name - cabelo@opensuse.org (#237) Alessandro de Oliveira Faria (A.K.A.CABELO) 2024-10-18 13:20:05 -0300
  • 153ff3740b
    CI job. Gpt awq 4 (#2665) Nicolas Patry 2024-10-18 17:55:53 +0200
  • 0229c71b21
    Update server/text_generation_server/layers/gptq/__init__.py Nicolas Patry 2024-10-18 17:55:36 +0200
  • f5b09467f1 Make handling of FP8 scales more consisent Daniël de Kok 2024-10-18 14:16:06 +0000
  • 8673bb050d
    Upgrading the tests (TP>1 fix changes to use different kernels.) Nicolas Patry 2024-10-18 14:42:16 +0200
  • 5ca6da15f5
    Revert change after rebase. Nicolas Patry 2024-10-18 13:02:50 +0200
  • 7b291355df
    Fix redundant import. Nicolas Patry 2024-10-18 12:29:33 +0200
  • ba7197c0db
    Unused import Nicolas Patry 2024-10-18 12:19:22 +0200
  • 3e12402a98
    Simplifying conditionals + reverting integration tests values. Nicolas Patry 2024-10-18 12:13:48 +0200