Commit Graph

  • 3b1b049b32
    Enable KQV offload by default Adrien Gallouët 2025-02-06 18:33:30 +0000
  • a68aefa86d
    Intel extension fix. Nicolas Patry 2025-02-06 19:26:57 +0100
  • 072082774e
    Attempt to fix intel CPU. Nicolas Patry 2025-02-06 16:53:40 +0100
  • dc2e3e5ded
    Forgot the integration snapshot. Nicolas Patry 2025-02-06 16:47:29 +0100
  • 856709d5c3
    [Backend] Bump TRTLLM to v.0.17.0 (#2991) Funtowicz Morgan 2025-02-06 16:45:03 +0100
  • 809e288b5a
    Fix fmt Adrien Gallouët 2025-02-06 14:58:44 +0000
  • 5367d94f34
    Fix requirements.txt Adrien Gallouët 2025-02-06 14:45:55 +0000
  • 595f2b6fce
    Town instead village. Nicolas Patry 2025-02-06 15:28:53 +0100
  • df723e646b
    Bump llama.cpp & cuda Adrien Gallouët 2025-02-06 13:24:36 +0000
  • 7bff88bba9
    Do not use HOSTNAME env Adrien Gallouët 2025-02-06 13:17:17 +0000
  • a91127c24b
    Updating mllama after strftime. Nicolas Patry 2025-02-05 12:11:18 +0100
  • 36223f834e
    Triton fix (#2995) Wang, Yi 2025-02-06 19:28:41 +0800
  • 0ef8c8a97a
    Using the "lockfile". (#2992) Nicolas Patry 2025-02-06 12:28:24 +0100
  • 8bc10d37ee
    Update docs Adrien Gallouët 2025-02-06 10:31:05 +0000
  • 2b0d99c1cf
    Thanks cargo fmt Adrien Gallouët 2025-02-06 10:08:18 +0000
  • fb81c0d1c4
    Thanks clippy Adrien Gallouët 2025-02-06 10:53:57 +0100
  • e4d5fa7eaf
    Update docs Adrien Gallouët 2025-02-06 09:46:24 +0000
  • 408663e61a fix triton to 3.1.0 to fix ipex import issue triton_fix Wang, Yi A 2025-02-06 00:51:17 -0800
  • 8a3c9fb79a
    Applying to other builds. Nicolas Patry 2025-02-06 09:46:25 +0100
  • 393a7efc9e backend(trtllm): link against CUDA 12.8 Morgan Funtowicz 2025-02-06 09:38:06 +0100
  • a3b05fc943
    Mode max. Nicolas Patry 2025-02-06 01:49:24 +0100
  • 2f7b023f39
    Don't break all other builds. Nicolas Patry 2025-02-06 01:13:31 +0100
  • c8b0eddf79
    Attempt #42 Nicolas Patry 2025-02-06 00:57:22 +0100
  • 3514d2dc8c
    .. Nicolas Patry 2025-02-05 23:14:51 +0100
  • 0d382c4508
    . Nicolas Patry 2025-02-05 23:11:01 +0100
  • d5fc0577b8 backend(trtllm): make sure we escalade all warnings as errors on the backend impl in debug mode Morgan Funtowicz 2025-02-05 23:01:09 +0100
  • 117d27849c backend(trtllm): use return value optimization flag as as error if available Morgan Funtowicz 2025-02-05 23:00:42 +0100
  • 1641c22af8
    Add doc Adrien Gallouët 2025-02-05 21:14:30 +0000
  • 7f00c37905 backend(trtllm): build against gcc-14 with cuda12.8 Morgan Funtowicz 2025-02-05 22:05:46 +0100
  • e3326e6b0b
    We need the launcher still. Nicolas Patry 2025-02-05 20:58:10 +0100
  • b3e40c4b66
    Improve default settings Adrien Gallouët 2025-02-05 16:38:52 +0000
  • 9258fa6a24
    How in the world... Nicolas Patry 2025-02-05 17:38:41 +0100
  • f22e2fb550
    Cleanup Adrien Gallouët 2025-02-05 16:12:34 +0000
  • 19ea893956
    The good old monkey. Nicolas Patry 2025-02-05 16:53:36 +0100
  • 0f62401b8e
    Initialize penalty_last_n with llamacpp default value Adrien Gallouët 2025-02-05 15:44:46 +0000
  • 695b1292e9
    Ensure all samplers are freed on error Adrien Gallouët 2025-02-05 15:42:59 +0000
  • 07c0080970 fix: add transformer overlay for processor support drbh 2025-02-05 15:42:22 +0000
  • 830c25dd5a
    Bad cache hits. Nicolas Patry 2025-02-05 16:21:57 +0100
  • d299b52cb5 backend(trtllm): link against decoder_attention_{0|1} Morgan Funtowicz 2025-02-05 16:15:31 +0100
  • 5caf5401ff
    .. Nicolas Patry 2025-02-05 15:38:53 +0100
  • 11c9acab42 backend(trtllm): use correct library reference decoder_attention_src Morgan Funtowicz 2025-02-05 15:33:36 +0100
  • 027931d262
    Another attempt. Nicolas Patry 2025-02-05 15:31:50 +0100
  • 5b777877b1
    Make max_batch_total_tokens optional Adrien Gallouët 2025-02-05 11:40:20 +0000
  • 09a745f1b8
    Remove n_ctx Adrien Gallouët 2025-02-05 11:31:58 +0000
  • 76c458a8a2
    Lock on python 3.11 Nicolas Patry 2025-02-05 12:28:19 +0100
  • 051ff2d5ce
    Rename bindings Adrien Gallouët 2025-02-05 11:13:17 +0000
  • c52f08351f
    Set TGI_LLAMA_PKG_CUDA from CUDA_VERSION Adrien Gallouët 2025-02-05 10:57:50 +0000
  • a1c78adc19
    Revert dummy modifications. Nicolas Patry 2025-02-05 11:55:17 +0100
  • 951eb62b56
    Using the "lockfile". Nicolas Patry 2025-02-05 11:48:40 +0100
  • dbee804129
    Simplify batching logic Adrien Gallouët 2025-02-05 10:12:39 +0000
  • d3a772a8dd
    Update args Adrien Gallouët 2025-02-05 10:10:38 +0000
  • 9f6f1e905d backend(trtllm): use arg instead of env Morgan Funtowicz 2025-02-05 10:30:02 +0100
  • 4c44de4ee7 backend(trtllm): forget to bump dockerfile Morgan Funtowicz 2025-02-05 10:27:47 +0100
  • 6168ffc23f backend(trtllm): bump TRTLLM to v.0.17.0 Morgan Funtowicz 2025-02-05 10:14:20 +0100
  • c837843264
    Merge 4e1c68e6f8 into c1cf36c0dc Funtowicz Morgan 2025-02-05 09:24:02 +0100
  • 76d526d931 feat: check before rope type adjustment and small refactors drbh 2025-02-05 02:27:29 +0000
  • 1f585775b8 fix: bump support models doc drbh 2025-01-31 12:47:04 -0500
  • 10aa62f87f feat: support qwen2.5 vl model drbh 2025-01-31 12:36:03 -0500
  • e007529590
    Update Cargo.lock Adrien Gallouët 2025-02-04 17:54:53 +0000
  • 906c265aef
    Cleanup Dockerfile Adrien Gallouët 2025-02-04 17:53:47 +0000
  • c1cf36c0dc
    Improve qwen vl impl (#2943) drbh 2025-02-04 12:44:18 -0500
  • dd2bd5fdb3
    impureWithCuda: fix gcc version (#2990) Daniël de Kok 2025-02-04 17:01:59 +0100
  • df2a4fbb8a
    Update Dockerfile_llamacpp Adrien Gallouët 2025-02-04 12:34:02 +0000
  • d883109df6
    Disable graceful shutdown in debug mode Adrien Gallouët 2025-02-03 20:58:33 +0000
  • 207041a977
    Bump llamacpp to b4623 Adrien Gallouët 2025-02-03 13:38:42 +0000
  • 38b33e9698
    Add --type-v & --type-k Adrien Gallouët 2025-02-03 12:39:28 +0000
  • bfb8e03e9f
    Add specific args for batch Adrien Gallouët 2025-02-03 11:03:47 +0000
  • e6a8d33902
    backend(llama): add CUDA architectures build argument for Dockerfile Morgan Funtowicz 2025-02-03 11:36:44 +0100
  • ea28332bb3
    Cleanup Adrien Gallouët 2025-02-01 20:40:59 +0000
  • 104a968d01
    Remove warmup Adrien Gallouët 2025-02-01 20:27:31 +0000
  • 8ed362d03a
    Clear request cache after completion Adrien Gallouët 2025-02-01 20:20:43 +0000
  • c8505fb300
    Auto-detect n_threads when not provided Adrien Gallouët 2025-02-01 18:33:26 +0000
  • 27534d8ee4
    Fix seq iterations Adrien Gallouët 2025-02-01 17:55:00 +0000
  • 96434a1e7e
    Fix batching Adrien Gallouët 2025-02-01 16:09:51 +0000
  • 2a51e415ff
    Output real logprobs Adrien Gallouët 2025-02-01 11:37:14 +0000
  • 161280f313
    Only export the latest logits Adrien Gallouët 2025-02-01 10:51:44 +0000
  • 960c12bd6e
    backend(llama): add CUDA Dockerfile_llamacpp for now Morgan Funtowicz 2025-01-31 22:13:59 +0100
  • f38c34aeb7
    Fix batch_pos Adrien Gallouët 2025-01-31 18:20:45 +0000
  • e88a527fcf
    Add --offload-kqv Adrien Gallouët 2025-01-31 16:23:22 +0000
  • ae5bb789c2
    Enable flash attention by default Adrien Gallouët 2025-01-31 16:07:10 +0000
  • 3f199134f0
    Fix args Adrien Gallouët 2025-01-31 15:51:28 +0000
  • 7a3ed4171e
    Add --numa Adrien Gallouët 2025-01-31 15:09:29 +0000
  • 390f0ec061
    Cleanup Adrien Gallouët 2025-01-31 15:00:23 +0000
  • d6ded897a8
    Add a stupid batch mechanism Adrien Gallouët 2025-01-31 12:44:09 +0000
  • e07835c5b5
    Add --defrag-threshold Adrien Gallouët 2025-01-31 10:38:34 +0000
  • f388747985
    Add GPU args Adrien Gallouët 2025-01-31 09:50:57 +0000
  • 8d2dfdf668
    Handle ctx args & fix sampling Adrien Gallouët 2025-01-30 22:41:26 +0000
  • a7b4b04cb5
    Add some input validation checks Adrien Gallouët 2025-01-30 20:21:37 +0000
  • e7facf692f
    Handle max_batch_size Adrien Gallouët 2025-01-30 19:50:09 +0000
  • 3eb4823f3e
    Use max_batch_total_tokens Adrien Gallouët 2025-01-30 15:12:55 +0000
  • bd0cc9905c
    Get rid of llama_batch_get_one() Adrien Gallouët 2025-01-30 13:41:35 +0000
  • 95e221eece
    Add llamacpp backend Adrien Gallouët 2025-01-24 09:05:37 +0000
  • b3436da43d trufflehog: do not fail on unverified results Daniël de Kok 2025-02-04 12:23:54 +0000
  • 06abe3f50e impureWithCuda: fix gcc version Daniël de Kok 2025-02-04 10:24:33 +0000
  • cfc6cbc4d6
    fix: Functioncall is actually a bit different than the deprecated function definition type Nicolas Casademont 2025-02-04 11:09:55 +0100
  • 45d3a3a253
    fix: Allow back arguments in function definition and the corresponding test Nicolas Casademont 2025-02-04 11:07:42 +0100
  • 58f5f2ee27 fix: adjust signatures with types drbh 2025-02-04 00:30:47 +0000
  • 6cb0cb68b4 fix: improve and simplify get_cos_sin, refactors and cleanup get_position_ids drbh 2025-02-04 00:25:59 +0000
  • 88fd56f549
    Add strftime_now callable function for minijinja chat templates (#2983) Alvaro Bartolome 2025-02-03 15:30:48 +0100
  • 8ae92e5d70
    Merge branch 'huggingface:main' into fix/dockerfile-triton Yaser Jaradeh 2025-02-03 11:48:01 +0100