Commit Graph

  • 731f890887
    Update tensor_parallel.py Kaixiong Happy 2024-12-03 19:00:28 +0800
  • 874bc28d6c feat(backend): make backend_workspace_t::engines_folder constexpr Morgan Funtowicz 2024-12-03 09:41:58 +0100
  • 2f8634ec01 feat(backend): impl missing generation_step_t as return value of pull_tokens Morgan Funtowicz 2024-12-02 23:28:25 +0100
  • a7bad25c41 feat(backend): fix backend_exception_t -> backend_error_t naming Morgan Funtowicz 2024-12-02 15:16:42 +0100
  • 879e1a4178 feat(backend): allow overriding which Python to use Morgan Funtowicz 2024-12-02 15:08:34 +0100
  • 71e700a6ea feat(backend): use latest trtllm main version to have g++ >= 13 compatibility Morgan Funtowicz 2024-12-02 00:06:24 +0100
  • fd7e2b5bbd feat(backend): more impl Morgan Funtowicz 2024-12-02 00:05:59 +0100
  • df99164dc1 feat(backend): delete previous backend impl Morgan Funtowicz 2024-12-01 23:49:25 +0100
  • 25c6bbe142 feat(backend): introduce backend_workspace_t to store precomputed information from the engine folder Morgan Funtowicz 2024-12-01 00:35:04 +0100
  • 702dc9cd05 feat(backend): missing return statement Morgan Funtowicz 2024-11-30 23:16:46 +0100
  • 87272ffe39 feat(backend): enable compiler warning if support for RVO not applying Morgan Funtowicz 2024-11-30 23:16:35 +0100
  • 9bb6309712 feat(backend): added some logging Morgan Funtowicz 2024-11-30 23:04:57 +0100
  • 6d3565759a feat(backend): remove all the logs from hardware.hpp Morgan Funtowicz 2024-11-19 00:19:22 +0100
  • 3a2698fb79 feat(backend): initial rewrite of the backend for simplicity Morgan Funtowicz 2024-11-19 00:17:35 +0100
  • 1830fe8833 test(ctest) enable address sanitizer Morgan Funtowicz 2024-11-19 00:17:10 +0100
  • 7a81040d1a feat(hardware) enable new hardware.hpp and unittests Morgan Funtowicz 2024-11-18 21:51:44 +0100
  • 0f17415d54 misc(cmake) update dependencies Morgan Funtowicz 2024-11-18 21:50:44 +0100
  • 1352f70847
    Fix prefix caching for chat completion since we removed logprobs. Nicolas Patry 2024-12-02 07:51:00 +0100
  • db1114955a
    chunking by default. Nicolas Patry 2024-12-02 07:00:03 +0100
  • 9fab7c6665
    Updated the flops calculation (checked with fvcore). Nicolas Patry 2024-11-11 14:31:32 +0100
  • 3ec9259b69
    Typo in h100. Nicolas Patry 2024-11-10 15:53:10 +0100
  • 3a53e8c288
    Damn inflated sparse tflops. Nicolas Patry 2024-11-10 15:32:34 +0100
  • 748dce60cd
    h100 better name, and keep factor of 2 Nicolas Patry 2024-11-10 14:14:10 +0100
  • 96ad65b51a
    Logprobs cost too much. Nicolas Patry 2024-11-10 07:00:22 +0100
  • e85dc0a0bb
    Adding a few more cards. Nicolas Patry 2024-11-05 11:24:20 +0100
  • 5bcb3e6ad2
    Adding A100 + H100 Nicolas Patry 2024-11-05 11:19:37 +0100
  • 23c0a20dc9
    Adding more cards. Nicolas Patry 2024-11-05 09:47:55 +0100
  • fa912440b1
    Taking into account number of shards. Nicolas Patry 2024-11-04 11:15:47 +0100
  • 54d3c8157c
    Attempt at automatic max batch prefill. Nicolas Patry 2024-11-04 10:59:07 +0100
  • b57f370386
    Saving some VRAM. (#2790) Nicolas Patry 2024-12-03 08:34:21 +0530
  • 2003d8be0c
    Sync (most) server dependencies with Nix (#2782) Daniël de Kok 2024-12-03 04:04:06 +0100
  • 45eb84e4b6
    Adding assertion. Nicolas Patry 2024-12-02 19:40:57 +0100
  • 83073b2069
    Fmt. Nicolas Patry 2024-12-02 18:32:49 +0100
  • b02a258841
    Upgrade eetq ? Nicolas Patry 2024-12-02 07:00:50 +0100
  • 2610733f3f
    Add a primitive script to generate Poetry commands to sync with Nix Daniël de Kok 2024-11-26 13:20:56 +0000
  • 15f9c1ca10
    Sync (most) server dependencies with Nix Daniël de Kok 2024-11-26 13:18:12 +0000
  • 253a992447 Remove the CI workflows we don't currently support yuanwu 2024-12-02 08:45:36 +0000
  • 56ea3ce826 use oneapi 2024 docker image directly for xpu Wang, Yi A 2024-12-02 00:35:32 -0800
  • 535149d872
    fix: only use eos_token_id as pad_token_id if int (#2774) Dmitry Rogozhkin 2024-12-01 21:26:37 -0800
  • 600d7e6ece
    Update server/text_generation_server/adapters/lora.py pr-2784-ci-branch Nicolas Patry 2024-12-02 06:02:02 +0100
  • 2c74c55637
    fix: add merge-lora arg for model id (#2788) drbh 2024-12-01 23:52:02 -0500
  • a35d1e6fe5
    Removing ../ that broke the link (#2789) Torsten Raudssus 2024-12-02 05:48:55 +0100
  • b4c5ca5a58
    Saving some VRAM. Nicolas Patry 2024-11-30 18:53:33 +0100
  • 1d2cb356b9
    Fix doc. (#2792) Nicolas Patry 2024-12-02 09:58:26 +0530
  • 3e9cfab897
    Fix doc. Nicolas Patry 2024-12-02 05:19:30 +0100
  • 4ee4ebc03b set flashdecoding blocksize as 64 Wang, Yi A 2024-12-01 18:55:05 -0800
  • 0228bd0260 Doesn't run the prefill warmup when limit_hpu_graph=true yuanwu 2024-12-01 21:29:41 +0000
  • 4586325a34 Fix the starCode warmup issue yuanwu 2024-11-26 08:55:42 +0000
  • e0dda9b614 feat(backend): use c++ defined types for llama.cpp Morgan Funtowicz 2024-11-29 23:38:27 +0100
  • c9f6c3a8f7 feat(backend): better map exception throw on C++ side Morgan Funtowicz 2024-11-29 23:34:16 +0100
  • 5fdfe692dc Removing ../ that broke the link Torsten Raudssus 2024-11-29 17:52:48 +0100
  • ddc35e4eb8 fix: add merge-lora arg for model id drbh 2024-11-29 11:00:13 -0500
  • db41776a0e feat(backend): add mimalloc memory allocator to the container Morgan Funtowicz 2024-11-29 16:22:55 +0100
  • f5c4cee364 feat(backend): correctly link to all libraries Morgan Funtowicz 2024-11-29 16:22:43 +0100
  • 63b8c59d9f
    Add poetry-plugin-export and fix indentation Alvaro Bartolome 2024-11-29 13:14:05 +0000
  • bbde434c40
    Remove commas from poetry export --extras ... Alvaro Bartolome 2024-11-29 13:12:06 +0000
  • a221dbb9a7
    Fix COPY destination for requirements_poetry.txt Alvaro Bartolome 2024-11-29 13:46:01 +0100
  • d9347f2f72
    Fix stage name in Dockerfile Alvaro Bartolome 2024-11-29 13:44:40 +0100
  • ebed60b8a9
    Add poetry.lock export into requirements_poetry.txt Alvaro Bartolome 2024-11-29 12:53:42 +0100
  • 59b0ef3018
    feat: Fix Cmakelist to allow building on Darwin platform (#2785) Hugo Larcher 2024-11-29 00:31:36 +0100
  • dfb4d5cd6a
    fix: Fix tokenizer in llama.cpp Dockerfile Hugo Larcher 2024-11-29 00:25:17 +0100
  • 26fca48a45
    fix: Fix tokenizer in llama.cpp Dockerfile Hugo Larcher 2024-11-29 00:24:52 +0100
  • d4e65bfe7b
    feat: Fix Cmakelist to allow building on Darwin platform Hugo Larcher 2024-11-29 00:01:33 +0100
  • b10eaab9f3 feat(backend): use new batch API to generate tokens Morgan Funtowicz 2024-11-28 23:57:24 +0100
  • dc6435e3a5 feat(backend): create llama_context_params with default factory Morgan Funtowicz 2024-11-28 23:57:08 +0100
  • b1ebc8f73b feat(backend): update llama.cpp to 4215 Morgan Funtowicz 2024-11-28 23:56:57 +0100
  • 6c5a75b593 misc(offline): update model creation as std::shared_ptr Morgan Funtowicz 2024-11-28 17:45:22 +0100
  • 9d659f1e23 feat(backend): add missing temperature parameter Morgan Funtowicz 2024-11-28 16:49:29 +0100
  • df72c56b5b feat(backend): add guard in case top_k = 0 Morgan Funtowicz 2024-11-28 16:30:20 +0100
  • 929a2fc718 feat(backend): add some test to the backend for core allocation Morgan Funtowicz 2024-11-28 14:53:46 +0100
  • 298367cdfd feat(backend): fix when num_cores_per_instance is equals to zero with the size of the generated core allocation Morgan Funtowicz 2024-11-28 14:53:35 +0100
  • 8e89793514 feat(backend): use the new batch api from llama Morgan Funtowicz 2024-11-28 14:52:48 +0100
  • 274cfce435 feat(backend): remove core overriding in the Rust backend Morgan Funtowicz 2024-11-28 10:59:50 +0100
  • d918e6a159
    Update Dockerfile.llamacpp as per review Funtowicz Morgan 2024-11-28 09:53:59 +0100
  • bbe95ca9e9
    Update Dockerfile.llamacpp as per review Funtowicz Morgan 2024-11-28 09:53:15 +0100
  • 98d0093660 fix lora failure in platform which does not contain punica_kernels Wang, Yi A 2024-11-28 00:14:22 -0800
  • b83419a769
    Merge branch 'habana-main' into 2.3.0 Yuan Wu 2024-11-28 12:38:36 +0800
  • d471805134
    Support continue final message (#2733) drbh 2024-11-27 19:13:30 -0500
  • caff779dd4
    Fix: docs typo (#2777) jp 2024-11-26 22:28:58 +0900
  • 892a26e549
    upgrade ipex cpu to fix coredump in tiiuae/falcon-7b-instruct (pageat… (#2778) Wang, Yi 2024-11-26 21:28:11 +0800
  • 636cdb4c43 Fix startcode issue yuanwu 2024-11-26 08:55:42 +0000
  • 72ab60fdd5
    Use FP8 KV cache when specified by compressed-tensors (#2761) Daniël de Kok 2024-11-26 08:27:41 +0100
  • 289aa48554
    Move JSON grammar -> regex grammar conversion to the router (#2772) Daniël de Kok 2024-11-25 18:47:34 +0100
  • c637d68d74
    feat: concat the adapter id to the model id in chat response (#2779) drbh 2024-11-25 12:36:31 -0500
  • 6082146c0d fix: updated to include only the adapter id in chat response drbh 2024-11-25 11:23:43 -0500
  • 438fd726ee Fix Rust test Daniël de Kok 2024-11-25 16:00:26 +0000
  • 8505341931 fix: replace expected output for continue test drbh 2024-11-25 10:10:33 -0500
  • 651a039dd3 feat: concat the adapter id to the model id in chat response drbh 2024-11-25 09:46:42 -0500
  • 594a6a7c22 fix: adjust continuation tests expected text drbh 2024-11-25 08:33:58 -0500
  • d5225d4196 One more snapshot update Daniël de Kok 2024-11-25 09:12:15 +0000
  • 639dc07088 Update tests/snapshots Daniël de Kok 2024-11-22 16:40:20 +0000
  • f5e55ad7d9 Can't format crate-hashes.json Daniël de Kok 2024-11-22 13:06:38 +0000
  • 85671e1a31 Formatting fixes Daniël de Kok 2024-11-22 12:51:26 +0000
  • 7e87e868e6 Move JSON grammar -> regex grammar conversion to the router Daniël de Kok 2024-11-22 12:21:32 +0000
  • b0c7658996 upgrade ipex cpu to fix coredump in tiiuae/falcon-7b-instruct (pageattention) Wang,Yi A 2024-11-25 07:57:54 +0000
  • d04c86c76c enable xpu flashdecoding Wang, Yi A 2024-11-24 21:40:00 -0800
  • d7c991b0d1 flash decoding Wang, Yi A 2024-11-05 00:48:23 -0800
  • a1dd3ffe25
    Fix: typo in model loading code jp 2024-11-25 10:25:32 +0900
  • ba72c188d0
    Merge branch 'huggingface:main' into feature/get-trace-id-from-req-headers Hyeongchan Kim 2024-11-23 15:21:18 +0900
  • 13a75acd76 fix: remove guideline tests drbh 2024-11-22 14:12:18 -0500