Commit Graph

  • 783473d200
    Put back trufflehog with proper extension. Nicolas Patry 2024-06-24 15:46:40 +0200
  • a7556ba800 fix: refactors and helpful comments multi-lora drbh 2024-06-24 13:39:56 +0000
  • 161bead116
    Network host. Nicolas Patry 2024-06-24 15:30:49 +0200
  • 2657a5df09
    New runner. Manual squash. Nicolas Patry 2024-06-24 15:19:44 +0200
  • 8a0bb53ef3 add docs Mohit Sharma 2024-06-24 11:09:17 +0000
  • f0d95b0f4b fixrs Mohit Sharma 2024-06-24 11:07:32 +0000
  • fb83e3416b fix Mohit Sharma 2024-06-24 08:25:59 +0000
  • 81fd601c44 rebase and update Mohit Sharma 2024-06-24 08:15:36 +0000
  • 084de9907c Merge branch 'main' into fp8_kvcache Mohit Sharma 2024-06-24 07:53:33 +0000
  • 2413db8b52
    Update README.md AS ABIR HASAN 2024-06-23 00:03:46 +0300
  • 07b939fc99
    Merge branch 'main' into fix-from-source-installation Youssef Ali 2024-06-21 15:51:04 -0400
  • 811a9381b1
    feat: sort cuda graphs in descending order (#2104) drbh 2024-06-21 14:28:26 -0400
  • 98e9be7221 feat: sort cuda graphs in descending order drbh 2024-06-21 16:07:30 +0000
  • 197c47a302
    Fix text-generation-server quantize (#2103) Daniël de Kok 2024-06-21 15:28:51 +0200
  • dc58c339fc Fix text-generation-server quantize Daniël de Kok 2024-06-21 14:26:33 +0200
  • e2ecafa025 [WIP] Dev guide Lysandre 2024-06-21 12:18:10 +0200
  • 4f75783ae0 Fix Dockerfile_amd and Dockerfile_intel ur4t 2024-06-21 16:30:04 +0800
  • 340450d765 Fix cargo-chef prepare ur4t 2024-06-21 15:58:06 +0800
  • e850cea85d fix: tweak shapes drbh 2024-06-21 03:25:18 +0000
  • b16109966d fix: add missing tests and renames drbh 2024-06-21 02:59:18 +0000
  • 29e922d3d4 feat: improve weight tests drbh 2024-06-20 20:38:19 +0000
  • 313d29f1f9 fix: adjust so all tests pass drbh 2024-06-20 19:28:29 +0000
  • 7ee217475e fix: adjust types and add tests drbh 2024-06-20 14:56:54 -0400
  • 62a1ddbf8a poetry actually can't handle the conflict between torch and nccl fxmarty 2024-06-20 18:35:00 +0000
  • 27a3792626 use v2.22.3 that also fixes @samsamoa's repro fxmarty 2024-06-20 18:12:48 +0000
  • a76b6f4413 add note in dockerfile fxmarty 2024-06-20 18:03:33 +0000
  • 2502ce4865 fix nccl issue fxmarty 2024-06-20 17:57:45 +0000
  • 4fe871ffaa
    Adjust max_new_tokens in warmup (#160) Karol Damaszke 2024-06-20 19:48:37 +0200
  • 65506e19bf update dockerfile debug-torch-23 fxmarty 2024-06-20 15:36:46 +0000
  • 56b16614de continue refactoring feat/backend_feature OlivierDehaene 2024-06-20 16:59:38 +0200
  • e7b1d5e422 Fix LLaVA-NeXT handling of non-square images Daniël de Kok 2024-06-20 15:05:16 +0200
  • abf56b75a4 refactor schedulers OlivierDehaene 2024-06-20 12:40:36 +0200
  • c27ae68d50 Update Launcher Docs Removing Option Kevin Duffy 2024-06-20 09:43:39 +0100
  • bcb3faa1c2
    Factor out sharding of packed tensors (#2059) Daniël de Kok 2024-06-20 09:56:04 +0200
  • 9ce4552bae Idefics2: sync added image tokens with transformers Daniël de Kok 2024-06-20 09:21:58 +0200
  • 8fc4b84d77 Factor out sharding of packed tensors Daniël de Kok 2024-06-12 16:20:51 +0200
  • f5a9837592
    Support exl2-quantized Qwen2 models (#2085) Daniël de Kok 2024-06-20 07:56:16 +0200
  • f1a6fdb900 corrected Pydantic warning. jeff 2024-06-19 21:33:39 -0400
  • 48010f14b5 fix: re update the docs pr-2076-ci-run drbh 2024-06-20 01:05:47 +0000
  • b85c045b7f fix: run python update doc drbh 2024-06-20 00:59:33 +0000
  • abfc581030 fix: adjust types drbh 2024-06-20 00:56:34 +0000
  • 75dccec40d Merge commit 'refs/pull/2076/head' of github.com:huggingface/text-generation-inference into main drbh 2024-06-20 00:41:54 +0000
  • 9854f20225 fix: set sharded true if WORLD_SIZE is set drbh 2024-06-12 17:12:18 +0000
  • 1dc1d5b3c5 ipex distributed ops support Wang, Yi A 2024-06-18 07:12:32 -0700
  • 397731b272 Merge branch 'main' of https://github.com/sywangyi/text-generation-inference into cpu_tgi Wang, Yi A 2024-06-19 17:22:28 -0700
  • 1033d3b503
    Fixing packages in Dockerfile (#162) Alexey Fadeev 2024-06-19 23:44:47 +0200
  • 70e1982ab2 feat: add simple tests for weights drbh 2024-06-19 20:57:20 +0000
  • cdbf802860
    feat: rotate tests ci token (#2091) drbh 2024-06-19 17:02:58 -0400
  • 7b8aaa57e5 feat: rotate tests ci token drbh 2024-06-19 17:49:26 +0000
  • a07b612989 fix: revert skips and prefer updated ci token for tests drbh 2024-06-19 17:31:13 +0000
  • c9e4526b9d fix: skip llama test CI (temp) 2 drbh 2024-06-19 17:19:40 +0000
  • ce70fce925 fix: skip llama test due to CI issue (temp) drbh 2024-06-19 17:03:13 +0000
  • 4f1543d3c7 fix: refactors and adjust flash llama lora logic drbh 2024-06-19 16:13:42 +0000
  • 695c577b7a fix ChatCompletion and ChatCompletionChunk object string not compatible with standard openai api sunxichen 2024-06-19 16:40:28 +0800
  • 01ea0ab2f6 Redirect or first request? Daniël de Kok 2024-06-19 09:52:05 +0200
  • c1125781e0 Try something maintenance/docker-network Daniël de Kok 2024-06-19 09:33:45 +0200
  • 9fb7790928 fix: update docker auth step ci-patch drbh 2024-06-18 11:49:43 -0400
  • 9ce3f046b8
    CI Login to dockerhub Guillaume LEGENDRE 2024-06-18 16:21:17 +0200
  • fe9abad1a9 mirror docker feat/page_re_alloc OlivierDehaene 2024-06-18 15:58:59 +0200
  • 224455f389 Merge branch 'main' into lora-internal drbh 2024-06-18 09:50:41 -0400
  • e5c27364be avoid join_all OlivierDehaene 2024-06-18 15:44:28 +0200
  • b21ed583ac fix logic OlivierDehaene 2024-06-18 13:56:16 +0200
  • fe6a2756f1
    Merge branch 'main' into feat/page_re_alloc OlivierDehaene 2024-06-18 13:13:49 +0200
  • 7ed1044585 added padded blocks and logs everywhere OlivierDehaene 2024-06-18 12:18:05 +0200
  • 14d83c176f Support exl2-quantized Qwen2 models Daniël de Kok 2024-06-18 10:55:43 +0200
  • 2b95c0d991
    Merge branch 'main' into cpu_tgi Funtowicz Morgan 2024-06-18 10:18:29 +0200
  • 11ea9ce002
    CI: pass pre-commit hooks again (#2084) Daniël de Kok 2024-06-18 09:38:21 +0200
  • 729df451cf CI: pass pre-commit hooks again Daniël de Kok 2024-06-18 09:36:47 +0200
  • 4f25c67d63
    CI: Tailscale improvements (#2079) Guillaume LEGENDRE 2024-06-18 09:13:04 +0200
  • dce37faeef
    change step order Guillaume LEGENDRE 2024-06-17 18:46:16 +0200
  • a7f8b61e6a
    network host Guillaume LEGENDRE 2024-06-17 18:22:02 +0200
  • 58da5e3f3c
    wait for ssh Guillaume LEGENDRE 2024-06-17 18:11:39 +0200
  • c8c7ccd31e
    Set maximum grpc message receive size to 2GiB (#2075) Daniël de Kok 2024-06-17 16:40:44 +0200
  • cf38bb7529
    Update build.yaml Guillaume LEGENDRE 2024-06-17 15:48:50 +0200
  • cbdeafb3fd
    Update build.yaml Guillaume LEGENDRE 2024-06-17 15:47:56 +0200
  • 651d7e2d19
    Update build.yaml Guillaume LEGENDRE 2024-06-17 15:41:02 +0200
  • a0b5ab2a89
    Update build.yaml Guillaume LEGENDRE 2024-06-17 15:40:02 +0200
  • 6139c6fc18
    test local tailscale Guillaume LEGENDRE 2024-06-17 15:32:23 +0200
  • 55227eb0c4 Fixup formatting to make PR pass Daniël de Kok 2024-06-17 15:06:03 +0200
  • 2128c5d09f Update to Rust 1.79.0 Daniël de Kok 2024-06-17 14:41:26 +0200
  • 991a1cbb3b Set maximum grpc message receive size to 2GiB Daniël de Kok 2024-06-17 12:26:31 +0200
  • 036a5a02a7
    Fix missing rope scaling option for YaRN calycekr 2024-06-17 21:52:52 +0900
  • cb8f999edc Update Launcher Docs Kevin Duffy 2024-06-17 12:24:25 +0100
  • 3c8fa90ee7 Update README.md Kevin Duffy 2024-06-17 12:15:26 +0100
  • 5673e5aad6 Update Docs Kevin Duffy 2024-06-17 12:12:55 +0100
  • 6e93482c46 Adding Service Name Environment variable for https://github.com/huggingface/text-generation-inference/issues/2069 Kevin Duffy 2024-06-17 11:24:51 +0100
  • 0f7d38e774
    fix build.rs watch files (#2072) Ziru Niu 2024-06-17 18:10:01 +0800
  • 131838919e
    Contributing guide & Code of Conduct (#2074) Lysandre Debut 2024-06-17 12:09:31 +0200
  • e903770897
    Support different image sizes in prefill in VLMs (#2065) Daniël de Kok 2024-06-17 10:49:41 +0200
  • 01fbb90cd4 Redirect to GitHub's tutorial on PRs Lysandre 2024-06-17 09:28:37 +0200
  • 716deccf71 fix build.rs watch files Ziru Niu 2024-06-17 02:36:05 +0000
  • bafee950af fix: str has no attribute logits Youssef Ali 2024-06-16 23:27:54 +0300
  • 4b96c6319e edit: remove the constrain of having an old version of transformers Youssef Ali 2024-06-16 23:27:35 +0300
  • 2e241e655b edit: upgrade the rust version to support inline const Youssef Ali 2024-06-16 23:26:45 +0300
  • ca1b2f4994
    Updated kv cache for starcoder (#128) Vidya Galli 2024-06-14 13:36:44 -0700
  • 1104885f00
    Merge branch 'main' into lora-internal drbh 2024-06-14 10:06:15 -0400
  • 0e1c28cafd fix: merge 'main' into lora-internal to resolve conflicts drbh 2024-06-14 14:02:33 +0000
  • 06c3254cc5 fix: avoid dockerfile conflict drbh 2024-06-14 13:58:38 +0000
  • ef86232c94
    [Torch.compile] Enable llama-2-7b (#157) Jacek Czaja 2024-06-14 15:56:23 +0200
  • 445f313504
    Adding architecture document (#2044) Alvaro Moran 2024-06-14 15:28:34 +0200