Commit Graph

  • 4498d6bc47 run python3 udpate_doc.py erikkaum 2024-07-31 12:04:22 +0200
  • 4a0fdad1a7 changes based on feedback erikkaum 2024-07-31 11:55:09 +0200
  • 00478579e3
    Update router/src/server.rs Erik Kaunismäki 2024-07-31 11:50:25 +0200
  • f4f0cb81f2
    Update docs/source/usage_statistics.md Erik Kaunismäki 2024-07-31 11:43:43 +0200
  • 5310ba0119 refactor usage stats erikkaum 2024-07-31 11:29:19 +0200
  • 31ebfd0dd7 (launcher) default new server::run parameters to false for now Morgan Funtowicz 2024-07-31 09:06:52 +0000
  • 8989c585c6 (docker) build ompi with SLURM support Morgan Funtowicz 2024-07-31 09:06:24 +0000
  • ae66cf5593 (docker) let's put rust in the TRTLLM folder when building Morgan Funtowicz 2024-07-31 09:06:11 +0000
  • 2b19d671b4
    Rebase TRT-llm (#2331) Nicolas Patry 2024-07-31 10:33:10 +0200
  • 2c890d4cdf enable HuggingFaceM4/idefics-9b in intel gpu Wang, Yi A 2024-07-31 01:31:02 -0700
  • f5f09ae9a8 Handle GPTQ-Marlin loading in GPTQMarlinWeightLoader Daniël de Kok 2024-07-24 14:36:52 +0000
  • 3d21c8f43a reable gemma2 in xpu Wang, Yi A 2024-07-30 22:20:49 -0700
  • 4d28e29236 hotfix: fix xpu crash brought by code refine. torch.xpu rely on import ipex Wang, Yi A 2024-07-30 19:28:59 -0700
  • 5123925101 fix: warn window_size_left when using flash attn 1 drbh 2024-07-30 20:24:48 +0000
  • 4b1005c7e1 fix: attempt forward on flash attn2 to check hardware support drbh 2024-07-30 17:20:40 +0000
  • 6e564a30a2 link against libtensorrt_llm and not libtensorrt-llm Morgan Funtowicz 2024-07-30 17:01:38 +0000
  • 98739b2035 provided None for api_key Morgan Funtowicz 2024-07-30 17:01:07 +0000
  • 579199f6f2 update TensorRT-LLM to latest version Morgan Funtowicz 2024-07-30 17:00:44 +0000
  • 5c81a1713c
    Fixing PB with default member backends/client Nicolas Patry 2024-07-30 18:45:17 +0200
  • dc2feb4e6f
    Remove PB from git. Nicolas Patry 2024-07-30 18:42:55 +0200
  • 9357fc162a
    Backporting 457fb0a1 Nicolas Patry 2024-07-30 18:21:11 +0200
  • bbdd26e2be
    Backporting telemetry. Nicolas Patry 2024-07-30 18:18:39 +0200
  • b2edffabb9
    Remove both check + clippy ? Nicolas Patry 2024-07-30 17:02:41 +0200
  • f9d4a08f21
    Tmp. Nicolas Patry 2024-07-30 17:00:05 +0200
  • 3e19ce117c
    ? Nicolas Patry 2024-07-30 16:52:31 +0200
  • 2641c853ad
    Remove cargo fmt temporarily. Nicolas Patry 2024-07-30 16:38:38 +0200
  • 1dbcf7532e
    Adding pb files ? Nicolas Patry 2024-07-30 16:22:10 +0200
  • db17050c22
    Fix trtllm lint. Nicolas Patry 2024-07-30 16:21:07 +0200
  • e3418c3340
    Updating the schema thing + redocly. Nicolas Patry 2024-07-30 16:14:52 +0200
  • fa687dd340
    Fix makefile + autodocs. Nicolas Patry 2024-07-30 15:40:02 +0200
  • 2611c1a55f
    Fixing client. Nicolas Patry 2024-07-30 15:27:57 +0200
  • 53aec27328
    server quantize: store quantizer config in standard format (#2299) Daniël de Kok 2024-07-30 15:16:20 +0200
  • ad7d8b3432
    Ignore backends/v3 by default. Nicolas Patry 2024-07-30 12:47:34 +0200
  • f6b60bab73
    Let's try to enable trtllm backend. Nicolas Patry 2024-07-30 12:37:12 +0200
  • 33c4b0d8c3
    Fix autodocs. Nicolas Patry 2024-07-30 12:35:12 +0200
  • bc0a33e1c9
    Rebase. Nicolas Patry 2024-07-30 12:22:24 +0200
  • 05c13c89de Remove useless modification yuanwu 2024-07-30 10:05:38 +0000
  • ddbbf6b50c
    wip OlivierDehaene 2024-06-26 12:08:56 +0200
  • 8fad7ae5a2 add some more basic info in README.md backends/trtllm-executor Morgan Funtowicz 2024-07-30 08:45:29 +0000
  • b665e2fa0a look for cuda 12.5 Morgan Funtowicz 2024-07-30 08:45:20 +0000
  • 67c0b5eb6d add numa to improve cpu inference perf Wang, Yi A 2024-07-30 00:03:23 -0700
  • 3f0f0e0825 Add the habana profiler yuanwu 2024-07-30 03:53:46 +0000
  • db0b6567e1 Remove log yuanwu 2024-07-29 22:02:42 +0000
  • 588a014551 Enable llava-next yuanwu 2024-07-28 09:05:49 +0000
  • 0986835548 fix: remove global model id drbh 2024-07-29 16:58:42 +0000
  • a014b220e1 MODEL_ID propagation fix root 2024-07-24 03:35:53 +0000
  • 2ce1476e52 fix: remove global model id drbh 2024-07-29 16:51:53 +0000
  • c2413a0153 MODEL_ID propagation fix root 2024-07-24 03:35:53 +0000
  • 95ff267043 fix: remove global model id drbh 2024-07-29 16:30:53 +0000
  • 2aa9e3c23d Merge commit 'refs/pull/2290/head' of github.com:huggingface/text-generation-inference into main drbh 2024-07-29 16:28:42 +0000
  • 3fb74e4626 fix: remove global model id drbh 2024-07-29 16:22:38 +0000
  • 1246e2193f
    Merge 592ea3f2f8 into 0b95693fb8 Edwin Hernandez 2024-07-29 11:22:36 -0500
  • c954a5c92a Merge branch 'pr-2290' into pr-2290-ci drbh 2024-07-29 16:19:28 +0000
  • 0b95693fb8
    fix: adjust test snapshots and small refactors (#2323) pr-2290-ci-runner drbh 2024-07-29 11:38:38 -0400
  • 3d7f4f41bb
    patch-error-on-invalid-grammar (#2282) Erik Kaunismäki 2024-07-29 16:09:25 +0200
  • f15e808d4c
    fix: reject grammars without properties (#2309) drbh 2024-07-29 10:07:25 -0400
  • b5f61e92b5 fix: revert non snapshot changes drbh 2024-07-29 14:05:55 +0000
  • 922732b255
    Install Marlin from standalone package (#2320) Daniël de Kok 2024-07-29 15:37:10 +0200
  • 8e4a3b8dd7 Install Marlin from standalone package Daniël de Kok 2024-07-26 14:22:13 +0000
  • 583d37a2f8
    Run ci api key (#2315) Erik Kaunismäki 2024-07-29 11:14:17 +0200
  • 4f69d04c3a hotfix: increase precision of GPTQ/AWQ-Marlin Daniël de Kok 2024-07-29 08:40:17 +0000
  • fd2e06316d
    fix: fix buildkit config in ci Adrien 2024-07-29 09:25:56 +0200
  • 68854d11ef fix: adjust test snapshots and small refactors drbh 2024-07-28 22:58:12 +0000
  • bab02ff2bc
    feat: add ruff and resolve issue (#2262) drbh 2024-07-26 10:29:09 -0400
  • b97684536f explicit todo that this is only short term erikkaum 2024-07-26 15:50:35 +0200
  • 4b49c50f4c
    Support tied embeddings in 0.5B and 1.5B Qwen2 models (#2313) Daniël de Kok 2024-07-26 14:57:24 +0200
  • 6609feec64 revert wrong update erikkaum 2024-07-26 11:39:22 +0200
  • 12381b0b0e delete the last no repeat processor from warpers feature/no_repeat_ngram_size erikkaum 2024-07-25 17:31:04 +0200
  • 135be1f5c7 update docs again erikkaum 2024-07-26 11:09:11 +0200
  • fcb2dfb683 fixes and update docs erikkaum 2024-07-26 11:00:27 +0200
  • 21efb19c13 changes from original branch erikkaum 2024-07-26 10:59:09 +0200
  • 169c8c2cf5 token.to_str() returns result add_api_key erikkaum 2024-07-26 10:52:55 +0200
  • b890c8c47d Update vLLM dependency to 0.5.3.post1 Daniël de Kok 2024-07-26 08:51:32 +0000
  • cd01adcdee token.to_str() returns result erikkaum 2024-07-26 10:27:43 +0200
  • cd2508c19f Support tied embeddings in 0.5B and 1.5B Qwen2 models Daniël de Kok 2024-07-26 08:18:30 +0000
  • 28ae96b28b fix: reject grammars without properties drbh 2024-07-25 17:12:38 +0000
  • 80d1868ecf Fix registry name (#2307) Adrien 2024-07-25 16:06:00 +0200
  • 22d2341fb8 Fixing idefics on g6 tests. (#2306) Nicolas Patry 2024-07-25 14:44:21 +0200
  • 70465b23e2 Some small fixes for the Torch 2.4.0 update (#2304) Daniël de Kok 2024-07-25 13:34:44 +0200
  • 3fe117f492 Using g6 instead of g5. (#2281) Nicolas Patry 2024-07-25 11:21:17 +0200
  • ec4054487e fix: refactor adapter weight loading and mapping (#2193) drbh 2024-07-24 15:32:14 -0400
  • 8ed01b16dc Split up layers.marlin into several files (#2292) Daniël de Kok 2024-07-24 16:33:26 +0200
  • 7c7a0e9897 fix of use of unquantized weights in cohere GQA loading, also enable … (#2291) Wang, Yi 2024-07-24 16:44:02 +0800
  • 9e7c515489 fix crash in multi-modal (#2245) Wang, Yi 2024-07-24 16:39:08 +0800
  • 57ecf0b78a hotfix: update nccl OlivierDehaene 2024-07-23 23:31:28 +0200
  • bc076dceb6 chore: update to torch 2.4 (#2259) OlivierDehaene 2024-07-23 20:39:43 +0000
  • 0eca032d04 hotfix: pin numpy (#2289) Daniël de Kok 2024-07-23 17:53:19 +0200
  • cb3b8fddba Add support for Llama 3 rotary embeddings (#2286) Daniël de Kok 2024-07-23 17:18:54 +0200
  • 6dd74a3321 Preparing for release. (#2285) Nicolas Patry 2024-07-23 16:20:17 +0200
  • 7c874e5c4f [WIP] Add support for Mistral-Nemo by supporting head_dim through config (#2254) shaltielshmid 2024-07-23 16:00:07 +0300
  • 9ae43a1bee Add support for repacking AWQ weights for GPTQ-Marlin (#2278) Daniël de Kok 2024-07-23 13:08:20 +0200
  • 6b6dce2a4e fix(l4): fix fp8 logic on l4 (#2277) OlivierDehaene 2024-07-23 09:24:29 +0000
  • d3ebcdc424 Fixing mistral nemo. (#2276) Nicolas Patry 2024-07-23 11:16:03 +0200
  • 4bf3e5971b use proper name for ci (#2274) Adrien 2024-07-22 21:50:53 +0200
  • 800d8de688 Softcapping for gemma2. (#2273) Nicolas Patry 2024-07-22 18:27:10 +0200
  • 9c0c652b4c fix(server): fix fp8 weight loading (#2268) OlivierDehaene 2024-07-22 15:51:32 +0000
  • bca1d8669a fix(ci): test new instances (#2272) Adrien 2024-07-22 14:41:30 +0200
  • f3b0c2f3d9 legacy warning on text_generation client (#2271) Erik Kaunismäki 2024-07-22 12:00:17 +0200
  • 2ca2fd6f56 Hotfix: fix of use of unquantized weights in Mixtral GQA loading (#2269) icyboy™ 2024-07-22 17:31:00 +0800
  • 3a5f11ebb4 fix(server): fix deepseekv2 loading (#2266) OlivierDehaene 2024-07-21 16:48:04 +0000