Commit Graph

  • 46775b1c03
    fix: Fix to allow report for a full failed test Hugo Larcher 2024-07-17 16:44:52 +0200
  • 9220340ff7 compute the number of maximum new tokens for each request independently Morgan Funtowicz 2024-07-17 13:55:29 +0000
  • 3e303f21ff fix clippy erikkaum 2024-07-17 11:17:38 +0200
  • 3a1463c187
    Update load_tests/benchmarks/templates/main.js.j2 Hugo Larcher 2024-07-17 10:54:01 +0200
  • 959b9dc25f Fixup constructor arguments feature/prefix Daniël de Kok 2024-07-17 07:42:24 +0000
  • d3155d6f41
    Merge branch 'habana-main' into v2.0.4 yuanwu2017 2024-07-17 13:45:15 +0800
  • b34edc2ee9 Upgrade to 2.0.4 yuanwu 2024-07-17 05:08:52 +0000
  • 179336888e Modifing the version number. Nicolas Patry 2024-05-24 10:52:28 +0000
  • 42b0847a80 Fixing codellama loads by using purely AutoTokenizer. (#1947) Nicolas Patry 2024-05-24 12:40:39 +0200
  • 075092315e Improving the logging system. (#1938) Nicolas Patry 2024-05-23 15:40:40 +0200
  • 4239e4d327 Add completion route to client and add stop parameter where it's missing (#1869) Thomas Schillaci 2024-05-23 15:37:09 +0200
  • 7cf21294d1 Fixing some legacy behavior (big swapout of serverless on legacy stuff). (#1937) Nicolas Patry 2024-05-23 14:39:38 +0200
  • 42693c4021 reenable xpu for tgi (#1939) Wang, Yi 2024-05-23 20:11:08 +0800
  • a758d32c64 feat: add train medusa head tutorial (#1934) drbh 2024-05-23 05:34:18 -0400
  • 57ba035a61 fix: use path inside of speculator config (#1935) drbh 2024-05-22 14:46:29 -0400
  • b9469a1878 Creating doc automatically for supported models. (#1929) Nicolas Patry 2024-05-22 16:22:57 +0200
  • 3adbc4cc04 docs: Fix grafana dashboard url (#1925) Junlin Zhou 2024-05-22 01:12:14 +0800
  • f1976851d9 ROCm: make CK FA2 default instead of Triton (#1924) fxmarty 2024-05-20 02:44:48 +0200
  • ed2539510a Fixing the download strategy for ibm-fms (#1917) Nicolas Patry 2024-05-18 13:31:24 +0200
  • 14ed7c7b4a Fix TGI issues with ROCm (#1921) fxmarty 2024-05-17 19:50:52 +0200
  • 05600c55a5 Fix TunableOp bug (#1920) fxmarty 2024-05-17 18:21:51 +0200
  • 24317977a7 Update grafana template (#1918) fxmarty 2024-05-17 17:37:23 +0200
  • 3631347766 Add TGI monitoring guide through Grafana and Prometheus (#1908) fxmarty 2024-05-17 16:34:44 +0200
  • 166dc0b87d MI300 compatibility (#1764) fxmarty 2024-05-17 15:30:47 +0200
  • 398ad027c7 Removing some unused code. (#1915) Nicolas Patry 2024-05-17 11:35:49 +0200
  • 125c8a05c3 Fixing signals. (#1910) Nicolas Patry 2024-05-16 21:40:10 +0200
  • 313960a829 Types. (#1909) Nicolas Patry 2024-05-16 17:21:00 +0200
  • 1687e00bfb Fixing types. (#1906) Nicolas Patry 2024-05-16 16:59:05 +0200
  • f691a945aa OpenAI function calling compatible support (#1888) phangiabao98 2024-05-16 15:17:00 +0700
  • 62b2a8b67b Pali gemma modeling (#1895) drbh 2024-05-16 00:58:47 -0400
  • b1d370e062 Update torch import reference in bnb quantization (#1902) Dhruv Srikanth 2024-05-15 20:08:32 +0100
  • 7b11b1804b feat: add deprecation warning to clients (#1855) drbh 2024-05-15 09:40:07 -0400
  • 2573f3aed4 Removing accepted ids in the regular info logs, downgrade to debug. (#1898) Nicolas Patry 2024-05-15 13:56:07 +0200
  • 27a5d6b5f9 Add GPT-2 with flash attention (#1889) Daniël de Kok 2024-05-15 13:31:22 +0200
  • 4494b84b18 Correct 'using guidance' link (#1892) Brandon Lockaby 2024-05-14 14:23:39 -0400
  • 95d15b4bbe MLPSpeculator. (#1865) Nicolas Patry 2024-05-14 12:33:18 +0200
  • 330aa87f3e Add: Support for the Falcon2 11B architecture (#1886) Nilabhra Roy Chowdhury 2024-05-14 10:06:02 +0200
  • c395431999 Granite support? (#1882) Nicolas Patry 2024-05-13 13:46:29 +0200
  • e5c4a219b3 Refactor layers. (#1866) Nicolas Patry 2024-05-13 12:44:30 +0200
  • b726e4fa84 update xpu docker image and use public ipex whel (#1860) Wang, Yi 2024-05-06 22:05:43 +0800
  • 263732ef7a Upgrading to rust 1.78. (#1851) Nicolas Patry 2024-05-06 13:48:11 +0200
  • 227c7770c6 Add router name to /info endpoint (#1854) Lucain 2024-05-03 16:39:04 +0200
  • 6a164ce270 Updating Phi3 (long context). (#1849) Nicolas Patry 2024-05-02 19:07:10 +0200
  • ee002cad1e feat: prefer huggingface_hub in docs and show image api (#1844) drbh 2024-05-02 10:56:24 -0400
  • 6310e2454c Remove misleading warning (not that important nowadays anyway). (#1848) Nicolas Patry 2024-05-02 15:09:46 +0200
  • 224be709ce Adding scripts to prepare load data. (#1841) Nicolas Patry 2024-05-01 21:48:06 +0200
  • d9c668f8b2 Fix: "Fixing" double BOS for mistral too. (#1843) Nicolas Patry 2024-05-01 18:21:17 +0200
  • a01cd030d4 oops missing c++ backend definitions Morgan Funtowicz 2024-07-16 20:11:59 +0000
  • 7784a21d48 impl RwLock scenario for TensorRtLllmBackend Morgan Funtowicz 2024-07-16 20:08:10 +0000
  • d567280265 cargo fmt erikkaum 2024-07-16 17:58:18 +0200
  • 27ef5aa029 Sync allocator interfaces Daniël de Kok 2024-07-16 14:42:32 +0000
  • b6f713d77c add nvidia-smi details in docs erikkaum 2024-07-16 16:15:27 +0200
  • cfc1187048 fix errors erikkaum 2024-07-16 16:03:41 +0200
  • 47a4f1fd00 parse xpu more robustly erikkaum 2024-07-16 16:02:53 +0200
  • 2967b8168c fix post refactor ci_amd3 fxmarty 2024-07-16 15:16:27 +0200
  • 291453fe88 Merge branch 'main' into ci_amd3 fxmarty 2024-07-16 15:15:17 +0200
  • 4642fd27ad
    fix: Compute comparison table Hugo Larcher 2024-07-16 11:15:29 +0200
  • 0ca54b55f8
    Do not schedule decode if max_new_tokens is equal to 1 (#183) bkowalskiINTEL 2024-07-16 14:53:24 +0200
  • aecbce351c more robust nvidia smi erikkaum 2024-07-16 14:53:18 +0200
  • 9fc54cd91c more robust way of checking if is in container erikkaum 2024-07-16 14:14:36 +0200
  • 713abb7073 delete json_output and ngrok erikkaum 2024-07-16 13:56:25 +0200
  • 796c5cbf38 on crash use anonymous error event Erik Kaunismäki 2024-07-16 13:52:00 +0200
  • a5f50c5b39
    Update router/src/main.rs Erik Kaunismäki 2024-07-16 13:57:33 +0200
  • ef63e93554
    Update router/src/main.rs Erik Kaunismäki 2024-07-16 13:57:13 +0200
  • 48b21eab7a Last accessed fixes Daniël de Kok 2024-07-16 11:55:46 +0000
  • dd2d6cfe40 Proper support for two allocations with overlapping prefixes Daniël de Kok 2024-07-16 11:40:35 +0000
  • d4ce5389ce Add another problematic case Daniël de Kok 2024-07-16 10:20:11 +0000
  • 0e6ff1293a Fixes Daniël de Kok 2024-07-16 10:10:10 +0000
  • 15e5df1cc4
    BS round up to BUCKET_SIZE to prevent capture graph when graph input not change (#185) BaihuiJin 2024-07-16 15:42:46 +0800
  • da82c63a4f
    Remove stray quantize argument in get_weights_col_packed_qkv (#2237) Daniël de Kok 2024-07-16 09:30:57 +0200
  • 2cb1842852
    server quantize: expose groupsize option (#2225) Daniël de Kok 2024-07-16 08:36:05 +0200
  • 6d28c381d2 Remove stray quantize argument in get_weights_col_packed_qkv Daniël de Kok 2024-07-16 06:33:00 +0000
  • 06d0e880e0
    Add support for AWQ-quantized Idefics2 (#2233) Daniël de Kok 2024-07-16 07:58:25 +0200
  • 6a3ac789a3 Add support for AWQ-quantized Idefics2 Daniël de Kok 2024-07-15 16:06:16 +0000
  • 7c046c9190 First step towards cleaning up Daniël de Kok 2024-07-15 14:14:10 +0000
  • 01e61bf0a4 make nrns optional Nathan Brake 2024-07-15 13:58:34 +0000
  • 05611f6b40 Renaming, window size Daniël de Kok 2024-07-15 13:58:10 +0000
  • ae46faef4d update docs Nathan Brake 2024-07-15 13:55:43 +0000
  • 28e6a504c0 Add support for no_repeat_ngram_size Nathan Brake 2024-07-15 13:51:11 +0000
  • 083806aa42 Traitify the current allocator in preparation for swappable alloc Daniël de Kok 2024-07-15 13:44:22 +0000
  • 0ad7f6f87d
    fix: Remove bitsandbytes installation when running cpu-only install (#2216) Hugo Larcher 2024-07-15 15:34:20 +0200
  • 457fb0a188
    fix custom cache dir (#2226) Erik Kaunismäki 2024-07-15 15:17:13 +0200
  • 5a65066922
    feat: simple mistral lora integration tests (#2180) drbh 2024-07-15 09:16:15 -0400
  • f6ad3b3585 Some MoE exploration experiment/moe Daniël de Kok 2024-07-15 11:47:52 +0000
  • dc79d4fd24 run pre-commit locally ErikKaumk 2024-07-15 11:23:34 +0200
  • 0c16204b7e try again to update docs ErikKaumk 2024-07-15 10:44:26 +0200
  • 53ad84dac7 fix error in passing flags to router ErikKaumk 2024-07-15 10:14:16 +0200
  • 81c9ad7073 Merge branch 'main' into feature/usage-stats ErikKaumk 2024-07-15 10:08:10 +0200
  • af661fd788 cargo fmt ErikKaumk 2024-07-15 09:56:25 +0200
  • dc64f8a3a8 maybe fix trailing whitespace ErikKaumk 2024-07-15 09:53:11 +0200
  • 31d9f4d5dc expose shutdown function at ffi layer Morgan Funtowicz 2024-07-15 07:36:01 +0000
  • 8d358d9c61
    feat: Add load tests Hugo Larcher 2024-07-11 11:45:24 +0200
  • 5b27307438
    Don't error on OpenAI valid top_p values. fix/allow-top-p-0 Michael Conrad 2024-07-12 16:22:23 -0400
  • b291be64a0 impl the rust backend which currently cannot move the actual computation in background thread Morgan Funtowicz 2024-07-12 19:26:32 +0000
  • 518d9a9e0b make sure to track include/ffi.h to trigger rebuild from cargo Morgan Funtowicz 2024-07-12 19:26:04 +0000
  • 344f33f398 end to end ffi flow working Morgan Funtowicz 2024-07-12 19:25:40 +0000
  • b846ae2d9e use external fmt lib Morgan Funtowicz 2024-07-12 19:24:32 +0000
  • 1972669f49 remove fmt import Morgan Funtowicz 2024-07-12 19:23:59 +0000
  • 3b4754cd31 Better leaf tracking Daniël de Kok 2024-07-12 16:03:21 +0200
  • 30ce9e0426 delete newlines ErikKaumk 2024-07-12 14:47:17 +0200