Commit Graph

  • 39de46129e fix: Adapt function call response to return a json string for arguments Nicolas Casademont 2025-01-24 11:47:01 +0100
  • 38a1987475 Use eetq kernel from the hub Daniël de Kok 2025-02-17 13:07:09 +0000
  • f866e9853c
    fix: trufflehog Hugo Larcher 2025-02-17 16:29:28 +0100
  • 95d1172347 fix: bump ci build yaml pr-3018-ci-branch drbh 2025-02-17 15:24:25 +0000
  • 9501956383 fix(neuron): increase ulimit when building image David Corvoysier 2025-02-17 15:22:36 +0000
  • 252f5468cc feat: add neuron case to build ci drbh 2025-02-17 15:22:02 +0000
  • 728cbfa4c6
    feat: Add the parsing of HF_HUB_USER_AGENT_ORIGIN environment variable to add info about the environment running TGI. That is useful to track usage in case of collaborations for example. Hugo Larcher 2025-02-17 11:15:42 +0100
  • 7fa960ec38
    Update README.md celsowm 2025-02-14 12:22:50 -0300
  • cfd4fbb479
    [Backend] Add Llamacpp backend (#2975) Adrien Gallouët 2025-02-14 13:40:57 +0100
  • 6df0fc0b55
    Support sigmoid scoring function in GPTQ-MoE (#3017) Daniël de Kok 2025-02-14 11:33:49 +0100
  • d6881c37ab
    Putting back the NCCL forced upgrade. (#2999) Nicolas Patry 2025-02-14 11:31:59 +0100
  • f0b404ba78
    add tool_calls field to Message struct sailesh duddupudi 2025-02-13 20:24:08 +0000
  • dd42bf97fb
    make content field optional in chat request sailesh duddupudi 2025-02-13 19:33:38 +0000
  • 57d6d13f68
    Slightly more reproducible build and not as scary. Nicolas Patry 2025-02-13 18:11:05 +0100
  • 8a211dc7fc
    Preventing single user hugging the server to death by asking (#3016) Nicolas Patry 2025-02-13 11:23:17 +0100
  • 17f0d57581
    Unpin rustrc version and set it to 'stable' (#269) Tomasz Thaddey 2025-02-13 10:49:09 +0100
  • 5398594077
    Merge 5452c1294c into 4cccce4b44 Funtowicz Morgan 2025-02-13 09:20:01 +0800
  • eb01342541
    Put back nccl latest (override torch). Nicolas Patry 2025-02-12 15:26:47 +0100
  • 4cccce4b44
    Update the flaky mllama test. (#3015) Nicolas Patry 2025-02-12 12:26:52 +0100
  • 59ef177d5f
    Torch 2.6, fork of rotary, eetq updated. Nicolas Patry 2025-02-12 12:26:15 +0100
  • 856d7682cf feat(neuron): add server and integration tests David Corvoysier 2025-02-12 09:10:47 +0000
  • 337329fff3 feat(neuron): add server standalone installation David Corvoysier 2025-02-11 15:51:09 +0000
  • 9c25afb832 feat: add neuron backend David Corvoysier 2025-02-11 09:53:16 +0000
  • 13decd6d44
    Patching flash v1. Nicolas Patry 2025-02-12 12:01:18 +0100
  • 1ea803cc80 Support sigmoid scoring function in GPTQ-MoE Daniël de Kok 2025-02-12 10:49:05 +0000
  • 76bcb4948d
    fix Qwen VL break in intel platform (#3002) Wang, Yi 2025-02-12 18:31:34 +0800
  • 3217134791
    Actually stay on flash v1. Nicolas Patry 2025-02-12 09:09:11 +0100
  • 412f605e32
    Preventing single user hugging the server to death by asking for way too many tokens. Nicolas Patry 2025-02-12 08:29:06 +0100
  • a31641c1b6 dockerfile change to ipex cpu/xpu Wang, Yi A 2025-02-11 19:51:05 +0000
  • bcf98a8b81
    Fix flash attention ? Nicolas Patry 2025-02-11 17:48:59 +0100
  • d4cac1a1ff
    Update the flaky mllama test. Nicolas Patry 2025-02-11 17:10:36 +0100
  • b7250f0473 Revert "fix: expand logic for different hardware" pr-3002-ci-branch drbh 2025-02-11 17:14:02 +0100
  • b86c3947ab
    Revert "Update the flaky mllama test." Nicolas Patry 2025-02-11 17:13:06 +0100
  • 8a870b31b9
    Update the flaky mllama test. Nicolas Patry 2025-02-11 17:10:36 +0100
  • 5b736e6d48
    Reverting the EETQ modification. Nicolas Patry 2025-02-11 17:00:36 +0100
  • 87cde07e21
    Rolling back torch version. Nicolas Patry 2025-02-11 16:56:02 +0100
  • 09631bc8a2 fix: bump prompt adjust-mllama-test-output drbh 2025-02-11 16:15:29 +0100
  • f9235ed0fc
    Cache min. Nicolas Patry 2025-02-11 11:47:51 +0100
  • 8863f3728c Fix CPU and memory affinity under external resource management Antti Kervinen 2025-02-07 15:59:41 +0200
  • 38b06b90b9 fix: update expected output from town to village drbh 2025-02-11 11:11:09 +0100
  • 40b50d853f
    Merge 8ae92e5d70 into 571ac9b507 Yaser Jaradeh 2025-02-11 12:35:26 +0300
  • 1714840604
    Dropping conda from the buidl system + torch 2.6 Nicolas Patry 2025-02-10 15:42:24 +0100
  • 351253ae1b
    Ignoring conda. Nicolas Patry 2025-02-07 17:07:25 +0100
  • d743252d44
    ... Nicolas Patry 2025-02-07 17:01:48 +0100
  • 0cd364f313
    . Nicolas Patry 2025-02-07 16:54:30 +0100
  • afbd82e6b5
    Putting back the NCCL forced upgrade. Nicolas Patry 2025-02-07 16:39:42 +0100
  • 571ac9b507
    Use kernels from the kernel hub (#2988) Daniël de Kok 2025-02-10 19:19:25 +0100
  • 89e83f6a9d
    Merge branch 'main' into handle-break-in-tool-template Alvaro Bartolome 2025-02-10 17:24:31 +0100
  • 7c09eae0a0 fix: expand logic for different hardware drbh 2025-02-10 15:45:29 +0000
  • ab92e153e1
    Merge remote-tracking branch 'origin/main' into fix-issue-2864 Nicolas Casademont 2025-02-10 11:15:04 +0100
  • eb0194a9c1 fix qwen2 vl crash in continous batching pr-3004-ci-branch Wang, Yi A 2025-02-10 01:54:45 -0800
  • ecbd956a4c add in Buffering.. Wang, Yi A 2025-02-09 19:28:37 -0800
  • efeef0bed6 change ChatCompletionChunk to align with "OpenAI Chat Completions streaming API" When stream_options: {"include_usage": true} is included, choices is None only for the last chunk, and usage is always None except for the last chunk. Wang, Yi A 2025-02-06 21:11:53 -0800
  • 9d7f257d60 could use PositionRotaryEmbedding impl so rocm and ipex could all work Wang, Yi A 2025-02-09 18:32:31 -0800
  • 57385c5463 fix Qwen VL break in intel platform Wang, Yi A 2025-02-09 18:01:20 -0800
  • b7d86e8e53 It's find in some machine. using hf_hub::api::sync::Api to download config is not successful which will make warmup fail since attribute like max_position_embeddings could not be got. update hf-hub to the latest version could fix it Wang, Yi A 2025-02-08 13:56:58 +0000
  • fc3ac8075b Remove outdated TODO Daniël de Kok 2025-02-07 20:13:50 +0000
  • df582a1842 ipex fix Daniël de Kok 2025-02-07 20:12:39 +0000
  • b0e66983be Record request parameters in OTel span for /v1/chat/completions endpoint Alex Weston 2025-02-07 13:16:49 -0500
  • 5fb4afbf5e
    Update doc Adrien Gallouët 2025-02-07 17:41:14 +0000
  • d96a77705d
    Update doc Adrien Gallouët 2025-02-07 16:48:28 +0000
  • 8ae7bc384c Update hf-kernels, fixup Docker Daniël de Kok 2025-02-07 16:06:07 +0000
  • 219b8b1613 EOF fix Daniël de Kok 2025-02-05 15:46:42 +0000
  • 96a4d4d083 attention -> paged-attention Daniël de Kok 2025-02-05 15:39:04 +0000
  • 8ad383c7cb marlin-kernels -> quantization Daniël de Kok 2025-02-05 15:10:41 +0000
  • 8aecc59eb0 Hoist another case of kernel loading out of a somewhat hot function Daniël de Kok 2025-02-05 13:09:23 +0000
  • f74a50d41b Take load_kernel out of a frequently-called function Daniël de Kok 2025-02-05 12:58:44 +0000
  • 875ce6d521 Fix EOF Daniël de Kok 2025-02-05 11:04:30 +0000
  • 371668ee88 Update tgi-nix flake for hf-kernels Daniël de Kok 2025-02-05 10:49:02 +0000
  • 520420a2dd Update kernels Daniël de Kok 2025-02-05 10:42:35 +0000
  • e038497478 Update hf-kernels to 0.1.5 Daniël de Kok 2025-02-05 10:40:28 +0000
  • 4c8ced2826 Nix: add attention/moe/quantization kernels Daniël de Kok 2025-02-05 10:39:49 +0000
  • ca1067f9db Fix unused imports Daniël de Kok 2025-02-04 15:11:22 +0000
  • 00af6ef70c CI: activate venv Daniël de Kok 2025-02-04 13:30:01 +0000
  • f25a7aad89 Fixup some imports Daniël de Kok 2025-02-04 13:22:24 +0000
  • a60d1e614f CI: download locked kernels for server tests Daniël de Kok 2025-02-04 12:45:34 +0000
  • dcb37316ae Update to moe 0.1.1 Daniël de Kok 2025-02-04 12:27:58 +0000
  • c1a564e738 Support latest moe kernels Daniël de Kok 2025-02-04 11:23:25 +0000
  • d39f896c5c Support loading local kernels for development Daniël de Kok 2025-02-04 11:20:56 +0000
  • b35ab54fd4 Update moe kernels Daniël de Kok 2025-02-03 18:25:19 +0000
  • c9191f3f2b Cache the kernels in the Docker image Daniël de Kok 2025-02-03 11:57:36 +0000
  • b267caa537 Use attention kernels from the Hub Daniël de Kok 2025-01-31 15:36:15 +0000
  • 758ff3c598 Use hub kernels for MoE/GPTQ-Marlin MoE Daniël de Kok 2025-01-28 12:51:45 +0000
  • aab6141b92 Use Hub kernels for Marlin and cutlass quantization kernels Daniël de Kok 2025-01-27 14:13:48 +0000
  • b77d05d3af
    Fix bool args Adrien Gallouët 2025-02-07 15:29:05 +0000
  • 1401418243
    Add HF transfer Adrien Gallouët 2025-02-07 14:45:53 +0000
  • 20603881e3
    Add test_chat_template_loop_controls to test break Alvaro Bartolome 2025-02-07 13:35:26 +0100
  • 508d47f80f
    Add README.md Adrien Gallouët 2025-02-07 12:12:13 +0000
  • 0702e0bfda
    Cleanup Adrien Gallouët 2025-02-07 12:08:34 +0000
  • 6bdb644f2c
    Handle custom llama.cpp dir Adrien Gallouët 2025-02-07 12:08:02 +0000
  • b6cfa0fbc0
    Add missing cuda prefix Adrien Gallouët 2025-02-07 11:48:16 +0000
  • 4841f71a0e
    Fix Dockerfile Adrien Gallouët 2025-02-07 12:26:28 +0100
  • 4b8cda684b
    Updating mllama after strftime. (#2993) Nicolas Patry 2025-02-07 10:38:13 +0100
  • 5691c91350
    Add loop_controls feature to minijinja Alvaro Bartolome 2025-02-07 10:36:49 +0100
  • 0d27ee74de
    Remove .cargo/config.toml Adrien Gallouët 2025-02-07 08:51:32 +0000
  • f66a31d6f3
    Triton version. Nicolas Patry 2025-02-07 09:02:32 +0100
  • 539097a158
    Bailing out of reproducible python env. Nicolas Patry 2025-02-07 08:23:12 +0100
  • cbbb9dee69
    Move workdir up a bit. Nicolas Patry 2025-02-07 00:07:27 +0100
  • 90e54a19e0
    Missed a step. Nicolas Patry 2025-02-07 00:01:10 +0100
  • 04462263dc
    Fixing the docker environment hopefully. Nicolas Patry 2025-02-06 23:56:12 +0100