Commit Graph

  • dfb801ff0f fix FlashDecoding change's regression in intel platform install triton because GPTQParams needs it. Wang, Yi A 2024-07-01 21:37:40 -0700
  • 4327210e6b
    [Major Change][Undecided yet] Move to FlashDecoding instead of PagedAttention kernel. (#1940) Nicolas Patry 2024-07-01 23:28:00 +0200
  • 4f55f15840
    Fixing baichuan override. (#2158) Nicolas Patry 2024-07-01 23:25:54 +0200
  • 83f61d6d7d Fixing baichuan override. Nicolas Patry 2024-07-01 21:05:19 +0000
  • 88e2a6a23a fix: avoid loading mistral adapters in mixtral fix-mixtral-adapter-loading drbh 2024-07-01 19:49:05 +0000
  • d9c7f69888 Add support for manually triggering a release build Daniël de Kok 2024-07-01 14:24:48 +0200
  • d8c459ecc0 fix: use the base layers weight in mistral rocm drbh 2024-07-01 16:42:45 +0000
  • 1c7c21d596 No need to recreate anything actually. Nicolas Patry 2024-07-01 16:37:36 +0000
  • ef8bce0b41 Fixup mistral clamping (had issues with cuda graphs). Nicolas Patry 2024-07-01 16:31:22 +0000
  • b686f66727 Fixing Mi{s,x}tral (non functional in Flash Decoding mode though). Nicolas Patry 2024-07-01 16:16:21 +0000
  • 6dc98abe46 Remove unused parameters annd force tokenizer name to be set Morgan Funtowicz 2024-07-01 16:11:59 +0200
  • 9895e8db99 Add more representative Llama GPTQ test Daniël de Kok 2024-07-01 14:08:44 +0200
  • 47ac5c654d Working FFI call for TGI and TRTLLM backend Morgan Funtowicz 2024-07-01 15:53:23 +0200
  • 1bd52157d8 Update mistral past. Nicolas Patry 2024-07-01 13:19:26 +0000
  • 8fa8cda660 Changing return everywhere. Nicolas Patry 2024-07-01 12:08:59 +0000
  • a26e57f9f3 Fixing non flash tests/imports. Nicolas Patry 2024-07-01 11:54:34 +0000
  • 4b1364da92 Factoring cu_seqlen_qk for better abstracting over every model. Nicolas Patry 2024-07-01 10:55:00 +0000
  • 65980ed75a These do not belong. Nicolas Patry 2024-06-25 15:06:52 +0000
  • 5f38d79719 "ipex" -> "cpu" Nicolas Patry 2024-06-25 14:24:28 +0000
  • 212a59544b Update? Nicolas Patry 2024-06-25 13:10:20 +0000
  • fcbc6876c0 No need for cache_manager anymore. Nicolas Patry 2024-06-25 12:24:45 +0000
  • 4f1b1a277c Rebased. Nicolas Patry 2024-06-25 12:20:50 +0000
  • 988aa34f3d Fix non decoding paths. Nicolas Patry 2024-05-31 22:56:31 +0000
  • b98b94d695 Fix Cohere. Nicolas Patry 2024-05-31 22:54:43 +0000
  • 66081e6ae7 Making it work on non flash decoding. Nicolas Patry 2024-05-31 21:41:19 +0000
  • 4293a12863 Using flash decoding Nicolas Patry 2024-05-17 08:43:33 +0000
  • d0225b1015
    GH router. (#2153) Nicolas Patry 2024-07-01 15:42:26 +0200
  • 466d4cef48 GH router. Nicolas Patry 2024-07-01 13:28:01 +0000
  • 17cebc4506
    Fixing test. (#2152) Nicolas Patry 2024-07-01 15:24:17 +0200
  • b85bb02b86 Fixing test. Nicolas Patry 2024-07-01 13:23:17 +0000
  • 9eefb2f672
    fix: prefer serde structs over custom functions (#2127) drbh 2024-07-01 09:08:05 -0400
  • 5da4cfab1c
    refine get xpu free memory/enable Qwen2/gemma2/gemma/phi in intel platform (#2132) Wang, Yi 2024-07-01 20:32:54 +0800
  • e0bfe4e7f0 fix Felix Marty 2024-07-01 12:31:56 +0000
  • afe9d74337 Fixing the post processor. Nicolas Patry 2024-07-01 12:29:20 +0000
  • 750ef7bc23 Merge branch 'ci_amd3' of github.com:huggingface/text-generation-inference into ci_amd3 Felix Marty 2024-07-01 12:20:40 +0000
  • 00cc73b7b7 fix post merge Felix Marty 2024-07-01 12:20:29 +0000
  • 9d0ca503a8
    fix AttributeError: 'MixtralLayer' object has no attribute 'mlp' (#2123) icyboy™ 2024-07-01 20:17:22 +0800
  • 59849777de Merge branch 'main' into ci_amd3 fxmarty 2024-07-01 14:14:46 +0200
  • 9fd395fae4 fix tests Felix Marty 2024-07-01 12:12:26 +0000
  • 153c8ae60f
    Merge branch 'main' into prefer-chat-object-enum Nicolas Patry 2024-07-01 14:10:45 +0200
  • 2ce8019480
    Use GPTQ-Marlin for supported GPTQ configurations (#2111) Daniël de Kok 2024-07-01 12:59:12 +0200
  • 0d97a93c1e
    feat: download lora adapter weights from launcher (#2140) drbh 2024-07-01 06:58:49 -0400
  • 25f57e2e98
    fix: use weights from base_layer (#2141) drbh 2024-07-01 06:58:40 -0400
  • b4552f9de9
    Fixing clippy. (#2149) Nicolas Patry 2024-07-01 12:02:19 +0200
  • 50da0ce75f
    Fixing clippy. Nicolas Patry 2024-07-01 12:01:22 +0200
  • 6ea570ddfe
    fix microsoft/Phi-3-mini-4k-instruct crash in batch.slots[batch.slot_… (#2148) Wang, Yi 2024-07-01 17:27:53 +0800
  • 18d978ba0f
    Apply suggestions from code review Nicolas Patry 2024-07-01 11:27:42 +0200
  • 81d0def84a fix microsoft/Phi-3-mini-4k-instruct crash in batch.slots[batch.slot_indices] Wang, Yi A 2024-07-01 00:38:20 -0700
  • dc402dc9ac Initial setup for CXX binding to TRTLLM Morgan Funtowicz 2024-06-30 23:37:20 +0200
  • 45da4460a3 change name to info routes Kevin Duffy 2024-06-28 18:46:07 +0100
  • b3e21ed42e Add API_Key for Auth and conditionally add authorisation for non info/health endpoints. Kevin Duffy 2024-06-28 18:41:21 +0100
  • 05d1011b4f fix xpu build Felix Marty 2024-06-28 16:08:27 +0000
  • a00db1b474 fix: adjust unwrap syntax in template drbh 2024-06-28 15:22:15 +0000
  • 68583d3240 working memory leak fix in tunableop Felix Marty 2024-06-28 15:15:12 +0000
  • 8885688630 fix: update create_post_processor logic for token type drbh 2024-06-28 15:07:50 +0000
  • c4feb9854c fix: use weights from base_layer drbh 2024-06-28 14:49:41 +0000
  • c326ffdac0 fix: adjust HubTokenizerConfig after rebase drbh 2024-06-27 11:33:29 -0400
  • d759a7f492 feat: leverage serde for conditional deser drbh 2024-06-27 15:11:14 +0000
  • 4ba5e74efc fix: adjust typo drbh 2024-06-27 13:11:19 +0000
  • ae14f8931e fix: enum CompletionType not ObjectType drbh 2024-06-27 13:07:05 +0000
  • 39c6d10b5a fix: adjust typo drbh 2024-06-26 22:56:36 +0000
  • f98f498473 fix: prefer enum for chat object drbh 2024-06-26 22:54:00 +0000
  • 1c0b916e63 feat: download lora adapter weights from launcher drbh 2024-06-28 14:26:03 +0000
  • 9815feb2e3 Revert "Update devcontainer to use correct update content command path" backends/trtllm Morgan Funtowicz 2024-06-28 15:26:45 +0200
  • b67073df41 Update devcontainer to use correct update content command path Morgan Funtowicz 2024-06-28 15:22:54 +0200
  • 8e25428713 Update devcontainer to remove clang and base image on PyTorch Morgan Funtowicz 2024-06-28 15:16:10 +0200
  • 3d50ff71b7 bump torch to more recent version Felix Marty 2024-06-28 13:10:43 +0000
  • f3e729a6d6 Add devcontainer to ease backend development Morgan Funtowicz 2024-06-28 14:39:19 +0200
  • 87db820627 fix rm Felix Marty 2024-06-28 09:49:20 +0000
  • fb98ab273f
    Fixing the CI to also run in release when it's a tag ? (#2138) Nicolas Patry 2024-06-28 09:31:09 +0200
  • 488ddee64d
    Fixing the CI to also run in release when it's a tag ? Nicolas Patry 2024-06-28 08:53:14 +0200
  • 192d49af0b
    2.1.0 names for release. v2.1.0 git_v2.1.0 Nicolas Patry 2024-06-28 08:20:59 +0200
  • 36077d8ff9 enable gemma/gemma2/phi in intel platform Wang, Yi A 2024-06-27 19:33:17 -0700
  • af16320e66 Merge branch 'main' into mem_refine Wang, Yi A 2024-06-27 19:12:42 -0700
  • 74b0231b19
    fix: refactor post_processor logic and add test (#2137) drbh 2024-06-27 17:16:19 -0400
  • a921854d92 fix: adjust when post_processor is overridden and improve create_post_processor drbh 2024-06-27 20:47:06 +0000
  • 74535ce80f fix: remove dev comment drbh 2024-06-27 18:42:41 +0000
  • f85cd58e2c fix: refactor post_processor logic and add test drbh 2024-06-27 18:34:26 +0000
  • eaa6890b3c remove hidden Felix Marty 2024-06-27 15:24:14 +0000
  • 0a5485d8a0 avoid permissions issues Felix Marty 2024-06-27 14:51:11 +0000
  • 3ea8259af1
    Fixing gemma2. (#2135) Nicolas Patry 2024-06-27 16:04:20 +0200
  • 0e4ab6d31c
    Fixing malformed rust tokenizers (#2134) Nicolas Patry 2024-06-27 16:04:03 +0200
  • aeeb291ffa Fix for deepseek too. Nicolas Patry 2024-06-27 13:56:20 +0000
  • dd2d91b043
    Idefics2: sync added image tokens with transformers (#2080) Daniël de Kok 2024-06-27 15:54:35 +0200
  • bbc949ff74 trigger ci Felix Marty 2024-06-27 13:47:21 +0000
  • 80b448c2bb Idefics2: sync added image tokens with transformers Daniël de Kok 2024-06-20 09:21:58 +0200
  • ded50f900d Adding new model. Nicolas Patry 2024-06-27 13:39:20 +0000
  • 02ac45131f some cleaning automodel-supports-flash-paged-attention Felix Marty 2024-06-27 13:33:35 +0000
  • 3760102077 add missing files Felix Marty 2024-06-27 13:30:40 +0000
  • aa87939774 Fixing malformed rust tokenizers Nicolas Patry 2024-06-27 13:30:32 +0000
  • 770975fa81 refactor Felix Marty 2024-06-27 13:24:58 +0000
  • 6982f9bcb1 enable qwen2 in xpu Wang, Yi A 2024-06-27 06:01:07 -0700
  • cb37c551ab working flash + paged through transformers Felix Marty 2024-06-27 12:39:36 +0000
  • 886bfab23d refine get xpu free memory Wang, Yi A 2024-06-27 05:18:57 -0700
  • 91423771be Missing dependency Morgan Funtowicz 2024-06-27 12:43:40 +0200
  • 4335a39f92 First definition of binding trtllm to rust Morgan Funtowicz 2024-06-27 12:41:49 +0200
  • c6537df493 enable build cmake binding Morgan Funtowicz 2024-06-27 12:41:36 +0200
  • b53b21c63a
    Bumping to 2.1 (#2131) Nicolas Patry 2024-06-27 12:34:43 +0200
  • d8185ad942
    Bumping to 2.1 Nicolas Patry 2024-06-27 11:56:21 +0200
  • 2e763d12ad Use GPTQ-Marlin for supported GPTQ configurations Daniël de Kok 2024-06-24 15:11:49 +0200