Commit Graph

  • 0b710f9671
    Update tehe doc. Nicolas Patry 2025-03-07 20:17:23 +0100
  • 207a70e7be
    Fixing the tool calling convention. Nicolas Patry 2025-03-07 19:42:36 +0100
  • b447f7e821
    Fix qwen vl (#3096) Nicolas Patry 2025-03-11 11:00:41 +0100
  • 925954c34c
    Fixing the CI. Nicolas Patry 2025-03-11 10:59:20 +0100
  • 911910c5d2
    Fixing qwen2.5 VL. Nicolas Patry 2025-03-11 10:57:58 +0100
  • 094975c3a8
    Update the llamacpp backend (#3022) Adrien Gallouët 2025-03-11 09:19:01 +0100
  • 6e8af50d2b
    Update README.md celsowm 2025-03-10 18:55:02 -0300
  • dc5f05f8e6
    Pr 3003 ci branch (#3007) drbh 2025-03-10 12:56:19 -0400
  • 124398fa57
    hotfix: qwen2 formatting (#3093) Daniël de Kok 2025-03-10 16:19:50 +0100
  • 751708aa70 cargo fmt Daniël de Kok 2025-03-10 15:09:59 +0000
  • d0cb06af4a hotfix: qwen2 formatting Daniël de Kok 2025-03-10 15:01:23 +0000
  • c5ecc7a4de
    Small test and typing fixes (#3078) Daniël de Kok 2025-03-10 15:08:23 +0100
  • 0a9f0dc53a More typing fixes Daniël de Kok 2025-03-06 16:48:34 +0000
  • ed5bfe4241 test_weights: add modules_to_not_convert Daniël de Kok 2025-03-06 16:44:10 +0000
  • cae0cbe87d
    Add modules_to_not_convert in quantized model (#3053) jiqing-feng 2025-03-10 22:03:51 +0800
  • bbe218a4f7
    Add qwen2 multi lora layers support (#3089) EachSheep 2025-03-10 19:42:59 +0800
  • 58a65f7914
    Add request parameters to OTel span for /v1/chat/completions endpoint (#3000) Alex Weston 2025-03-10 07:26:57 -0400
  • 976eae216f
    Nix: the launcher needs a Python env with Torch for GPU detection (#3085) Daniël de Kok 2025-03-10 12:11:10 +0100
  • e2dba5c0ad feat(upstream): add depreaction message for the tgi-gaudi fork due to upstream of gaudi baptiste 2025-03-10 10:43:09 +0000
  • 4d07b773e0 add qwen2 multi lora layers support to solve problem like https://github.com/huggingface/text-generation-inference/issues/2881, the similar PR are at https://github.com/huggingface/text-generation-inference/pull/2883 hjs 2025-03-09 13:21:56 +0800
  • 3598fcdf59
    Update the snapshot a bit. Nicolas Patry 2025-03-07 19:55:02 +0100
  • 6f4d496376
    Ruff. Nicolas Patry 2025-03-07 11:17:40 +0100
  • e2f4eed6d6
    Tweak for multi prompt. Nicolas Patry 2025-03-07 10:34:33 +0100
  • 9aa71d61fb
    Clippy. Nicolas Patry 2025-03-06 16:31:20 +0100
  • 818c8db29a
    change ChatCompletionChunk to align with "OpenAI Chat Completions streaming API" Nicolas Patry 2025-03-06 16:24:11 +0100
  • 622908deab
    Fix tool call2 (#3076) Nicolas Patry 2025-03-07 19:45:57 +0100
  • 55a6618434
    Update --max-batch-total-tokens description (#3083) Alvaro Bartolome 2025-03-07 14:24:26 +0100
  • 902156ca6f Nix: the launcher needs a Python env with Torch for GPU detection Daniël de Kok 2025-03-07 12:50:50 +0000
  • a1b3887846
    Update docstring in launcher/src/main.rs instead Alvaro Bartolome 2025-03-07 13:25:24 +0100
  • 7a40844734
    Update --max-batch-total-tokens description Alvaro Bartolome 2025-03-07 13:15:41 +0100
  • 036d802b62
    Nix: add openai to impure shell for integration tests (#3081) Daniël de Kok 2025-03-07 13:04:21 +0100
  • b53d1eb0d4 Nix: add openai to impure shell for integration tests Daniël de Kok 2025-03-07 11:09:04 +0000
  • e2846f76fa
    No root user TGI. no_root_user2 Nicolas Patry 2025-03-07 11:23:02 +0100
  • aba419a0cc
    Fix crash issue of llava-next fp8 (#286) Yuan Wu 2025-03-07 17:31:58 +0800
  • 17a9bb962e
    Update the old test. Nicolas Patry 2025-03-07 10:11:13 +0100
  • b9467b95a0
    wip: comment out prepend full_text John Castronuovo 2025-03-06 19:50:47 -0500
  • 5a5a51217e
    Stop being root in the docker. no_root_user Nicolas Patry 2025-03-06 16:45:55 +0100
  • 062be12812
    Clippy. Nicolas Patry 2025-03-06 16:25:21 +0100
  • 0e0844ce00
    Upgrade other tests. Nicolas Patry 2025-03-06 16:19:37 +0100
  • ad904be5fc
    Add the requirements. Nicolas Patry 2025-03-06 13:25:28 +0100
  • a9ac7d7f61
    Update all the integration tests. Nicolas Patry 2025-03-06 13:24:29 +0100
  • cd57fea11b
    Fix Llava next crash issue (#285) Yuan Wu 2025-03-06 17:12:21 +0800
  • 3350aa7125
    Arguments output is a string. Nicolas Patry 2025-03-05 15:38:33 +0100
  • b22eace4b3
    Making tool_calls a vector. Nicolas Patry 2025-03-05 15:13:25 +0100
  • 8e92942a18
    Making tool_calls a vector. (#3075) Nicolas Patry 2025-03-05 22:32:31 +0100
  • 5208d3f93e
    Less spammy logs too. Nicolas Patry 2025-03-05 20:58:27 +0100
  • a495ee5342
    Trying to reduce the logs in the case of errors. Nicolas Patry 2025-03-05 20:50:43 +0100
  • 3208d1cd1d
    Revert "Trying to reduce the logs in the case of errors." Nicolas Patry 2025-03-05 20:52:38 +0100
  • cdf70d6a28
    Trying to reduce the logs in the case of errors. Nicolas Patry 2025-03-05 20:50:43 +0100
  • dc024a9432
    Updating the old tests. Nicolas Patry 2025-03-05 18:29:29 +0100
  • ab9dafc68f
    Making sure Olmo (transformers backend) works. (#3074) Nicolas Patry 2025-03-05 17:46:47 +0100
  • 20ea73c6d4
    Fix mistralai/Mistral-7B-Instruct failed issue (#284) Yuan Wu 2025-03-06 00:01:23 +0800
  • 3f7369d1c1
    Add --build-arg llamacpp_native & llamacpp_cpu_arm_arch Adrien Gallouët 2025-03-05 15:49:35 +0000
  • e2c2fc0a49
    Add openai dependency. Nicolas Patry 2025-03-05 16:23:18 +0100
  • d57a91b9e5
    Fixing the nix overlay with updated version. Nicolas Patry 2025-03-05 16:04:40 +0100
  • 74c406d561
    Update doc. Nicolas Patry 2025-03-05 15:39:52 +0100
  • 175a059061
    Making tool_calls a vector. Nicolas Patry 2025-03-05 15:13:25 +0100
  • d8e93a1baa
    Making sure Olmo (transformers backend) works. Nicolas Patry 2025-03-05 12:24:25 +0100
  • 31766dad77
    Force upgrade transformers version for olmo. Nicolas Patry 2025-03-05 12:17:09 +0100
  • 8a79cfd077
    Bump llama.cpp Adrien Gallouët 2025-03-05 11:07:11 +0000
  • 8fe851209c
    Support HF_HUB_USER_AGENT_ORIGIN Adrien Gallouët 2025-03-05 10:59:01 +0000
  • aadd624933
    Update Cargo.lock Adrien Gallouët 2025-02-22 15:49:35 +0000
  • 46feaf6296
    Remove make-gguf.sh Adrien Gallouët 2025-02-22 12:54:46 +0000
  • 3849223340
    Bump llama.cpp and switch to ggml-org Adrien Gallouët 2025-02-20 15:57:45 +0000
  • 0a55bd3db9
    Quantize without llama-quantize Adrien Gallouët 2025-02-20 15:40:40 +0000
  • 6223b6e264
    Fix build with Mach-O Adrien Gallouët 2025-02-20 13:44:41 +0100
  • d41183a0b4
    Save gguf in models/MODEL_ID/model.gguf Adrien Gallouët 2025-02-19 16:13:50 +0000
  • 961a133d4b
    Update installed packages Adrien Gallouët 2025-02-19 16:47:42 +0100
  • 7388468e26
    Update doc Adrien Gallouët 2025-02-14 18:09:09 +0000
  • 0d01a89f0f
    Better error message Adrien Gallouët 2025-02-14 17:56:35 +0000
  • 2242d1a67c
    Update doc Adrien Gallouët 2025-02-14 13:36:54 +0000
  • 30cd3cf510
    Enable mmap, offload_kqv & flash_attention by default Adrien Gallouët 2025-02-14 13:18:28 +0000
  • 46bc8e6bc7
    Bump llama.cpp Adrien Gallouët 2025-02-14 13:11:20 +0000
  • 2d4aa25b9c
    Make --model-gguf optional Adrien Gallouët 2025-02-14 12:59:45 +0000
  • bda39e42c2
    Build faster Adrien Gallouët 2025-02-14 11:54:59 +0000
  • ec35976f82
    Only add token when it is defined. (#3073) Nicolas Patry 2025-03-05 11:59:52 +0100
  • cb42b3ad83
    fix(neuron): explicitly install toolchain (#3072) David Corvoysier 2025-03-05 11:46:58 +0100
  • 9b34930ba1
    Update router/src/server.rs Nicolas Patry 2025-03-05 11:45:53 +0100
  • 6777b1075d
    Only add token when it is defined. Nicolas Patry 2025-03-05 11:16:55 +0100
  • 7a9250fd5c ci(neuron): trigger CI when Dockerfile is modified David Corvoysier 2025-03-05 09:34:03 +0000
  • 51dc01474a fix(neuron): explicitly install toolchain David Corvoysier 2025-03-05 09:15:59 +0000
  • c34bd9d8d9
    3.1.1 Release. v3.1.1 git_3.1.1 Nicolas Patry 2025-03-04 18:11:30 +0100
  • 491ed9e11d
    Patch rust release. (#3069) Nicolas Patry 2025-03-04 18:07:33 +0100
  • 144d99c147
    Fix a tiny typo in monitoring.md tutorial (#3056) Sadra Barikbin 2025-03-04 19:36:26 +0330
  • e50e47eb92
    Update clippy. Nicolas Patry 2025-03-04 17:03:51 +0100
  • 1a8c4a773c
    Fixing the github action. Nicolas Patry 2025-03-04 16:56:05 +0100
  • bbc68748b7
    Merge branch 'main' into patch_rust Nicolas Patry 2025-03-04 16:50:57 +0100
  • 1b4ecd41e0
    Fixing docker llamacpp. Nicolas Patry 2025-03-04 16:48:39 +0100
  • 08bbfa16a1
    Preparing for release. (#3060) Nicolas Patry 2025-03-04 16:47:10 +0100
  • d8ff7f2623
    feat: add support for HF_HUB_USER_AGENT_ORIGIN to add user-agent Origin field in Hub requests. (#3061) Hugo Larcher 2025-03-04 16:43:50 +0100
  • 7b1c585019
    Fixing the github action. Nicolas Patry 2025-03-04 15:41:03 +0000
  • 23d9b8aec5
    Typo. Nicolas Patry 2025-03-04 15:28:28 +0000
  • d688786a64
    1.85 since the GH action doesn't respect the override. Nicolas Patry 2025-03-04 15:25:46 +0000
  • 91dda6ae59
    Move to the proper version of Rust. Nicolas Patry 2025-03-04 15:21:26 +0000
  • 2b8580a34a
    Fix neuron dockerfile. Nicolas Patry 2025-03-04 15:15:31 +0000
  • e88f6f6ee9
    Add property-based testing for RadixAllocator (#3068) Daniël de Kok 2025-03-04 15:09:46 +0100
  • db0ac03603
    Put back the toolchain ? Nicolas Patry 2025-03-04 12:51:15 +0000
  • 262ab01bd5
    Upgrade rust toolchain. Nicolas Patry 2025-03-04 13:44:18 +0100
  • a81dab55ee
    Trying to remove the rust-toolchain hardcoded in action. Nicolas Patry 2025-03-04 13:39:59 +0100
  • ddf0b02240
    All the assertions. tmp_invariants Nicolas Patry 2025-02-28 17:41:22 +0100