Commit Graph

  • b1aff577a0 Worse invention ever. Nicolas Patry 2024-02-14 10:09:00 +0000
  • 0523031ffb ... Nicolas Patry 2024-02-14 10:05:29 +0000
  • 69d1d3cde6 Bahs in yaml is not our friend. Nicolas Patry 2024-02-14 10:02:53 +0000
  • e36887cbf5 Install docker manually. Nicolas Patry 2024-02-14 10:00:33 +0000
  • 05aef4dd1a Upgrade install buildx. Nicolas Patry 2024-02-14 09:57:15 +0000
  • 85bf172653 Our runner docker in docker. Nicolas Patry 2024-02-14 09:52:34 +0000
  • 524e06066b Small cleanup. Nicolas Patry 2024-02-14 09:22:38 +0000
  • d6b0fb9e25
    Improving mamba runtime by using updates (#1552) Nicolas Patry 2024-02-14 09:54:10 +0100
  • 1ffc3a03c8 Typo. Nicolas Patry 2024-02-13 21:52:02 +0000
  • 7671a419a0
    Upgrade intermediary layer for nvidia too. (#1557) Nicolas Patry 2024-02-13 22:46:16 +0100
  • b9ac720d1e Generous snapshot for load because of accumulations errors in the logprobs. Nicolas Patry 2024-02-13 18:15:14 +0000
  • 2e44f082c8 Upgrade intermediary layer for nvidia too. Nicolas Patry 2024-02-13 18:00:00 +0100
  • c54b5c7f04 Remove tailscale. Nicolas Patry 2024-02-13 17:51:12 +0100
  • 6f68bb14c7
    Fixing glibc version in the runtime. (#1556) Nicolas Patry 2024-02-13 17:43:47 +0100
  • a83772c87b Self hosted for nvidia too. Nicolas Patry 2024-02-13 17:31:39 +0100
  • c804182300 Fixing glibc version in the runtime. Nicolas Patry 2024-02-13 17:29:03 +0100
  • 31d965bf17 Our runner. Nicolas Patry 2024-02-13 17:15:45 +0100
  • 246ad39d04
    feat: add deserialize_with that handles strings or objects with content (#1550) drbh 2024-02-13 10:01:02 -0500
  • d9000a2bcb Update load . Nicolas Patry 2024-02-13 12:11:34 +0000
  • 755ed82d25 Improving mamba runtime by using updates Nicolas Patry 2024-02-13 11:07:25 +0000
  • 91dcfe83db fix: cargo fmt tweak drbh 2024-02-12 11:00:21 -0500
  • a86e726079 fix: remove dev test that relies on local file drbh 2024-02-12 10:53:13 -0500
  • 3db6f0bb39 feat: add deserialize_with that handles strings or objects with content drbh 2024-02-12 09:48:03 -0500
  • 0d794af6a5
    feat: experimental support for cuda graphs (#1428) OlivierDehaene 2024-02-12 10:09:29 +0100
  • c85f737454 Fixing AMD dockerfile. Nicolas Patry 2024-02-09 19:27:53 +0000
  • 9a5d97235b Going from earlier release (newer ones has bugs in shape it seems). Nicolas Patry 2024-02-09 16:26:34 +0000
  • 8f93b47395 Upgrade the ubuntu version too. Nicolas Patry 2024-02-09 13:53:07 +0000
  • 72f74bcbc4 Fix for AWQ. Nicolas Patry 2024-02-09 13:09:40 +0000
  • 7143130ba4 Update docs after rebase. Nicolas Patry 2024-02-09 11:44:42 +0000
  • 4b06f318cb Update dockerfile. Nicolas Patry 2024-02-09 11:39:46 +0000
  • 903fbec604 Fixing AWQ. Nicolas Patry 2024-02-09 11:29:34 +0000
  • 3ce42ba7ec Fixing all quantization kernels. Nicolas Patry 2024-02-09 11:12:12 +0000
  • 4b524a305c Update the doc. Nicolas Patry 2024-02-08 09:26:38 +0000
  • bc95292eb8 Disable cuda graph with speculation (for now) and update the docs. Nicolas Patry 2024-02-08 09:22:19 +0000
  • 4fd6e62655 fix OlivierDehaene 2024-01-15 18:24:22 +0100
  • 33e94379c8 fix speculate OlivierDehaene 2024-01-10 17:40:37 +0100
  • ca20c304b3 add log OlivierDehaene 2024-01-10 17:17:48 +0100
  • 8260dc00d8 fix env var OlivierDehaene 2024-01-10 17:17:19 +0100
  • 9904f66966 fix value OlivierDehaene 2024-01-10 16:49:00 +0100
  • 15fdd40587 feat: experimental support for cuda graphs OlivierDehaene 2024-01-10 16:34:39 +0100
  • 1d929a243a fix: use TORCH_NCCL_AVOID_RECORD_STREAMS=1 OlivierDehaene 2024-01-09 17:59:16 +0100
  • 532146338b
    feat(router): add max_batch_size (#1542) OlivierDehaene 2024-02-09 12:38:41 +0100
  • a4e5801684
    ROCm AWQ support (#1514) Ilyas Moutawwakil 2024-02-09 10:45:16 +0100
  • 326f8e30ac Better error message on non rocm. Nicolas Patry 2024-02-09 09:44:53 +0000
  • c5ef81bed5
    chore: bump ci rust version (#1543) drbh 2024-02-09 04:32:04 -0500
  • d9ee73eea5 chore: bump ci rust version drbh 2024-02-08 13:07:13 -0500
  • 09b7c26bbd
    feat(server): add frequency penalty (#1541) OlivierDehaene 2024-02-08 18:41:25 +0100
  • 55e29c9564 my b OlivierDehaene 2024-02-08 17:28:54 +0100
  • 2af011a1c0 use max_size in the batch task OlivierDehaene 2024-02-08 17:26:55 +0100
  • 9e042bd117 update doc OlivierDehaene 2024-02-08 17:12:14 +0100
  • 01e61bb8f6 fix rust test OlivierDehaene 2024-02-08 17:10:36 +0100
  • faaa9dfe0a feat(router): add max_batch_size OlivierDehaene 2024-02-08 17:01:20 +0100
  • a76821e0b2 Update llama gptq. Nicolas Patry 2024-02-08 15:42:33 +0000
  • 81fa53f37b Fix tests. add_batch_dimension Nicolas Patry 2024-02-08 15:25:34 +0000
  • bc157af9b0 generate g_idx only for triton kernel IlyasMoutawwakil 2024-02-08 16:05:09 +0100
  • 40f693b6b9 Fix PR. Nicolas Patry 2024-02-08 15:04:27 +0000
  • e29fb799cb Merge branch 'rocm-awq-support' of https://github.com/huggingface/text-generation-inference into rocm-awq-support IlyasMoutawwakil 2024-02-08 16:03:17 +0100
  • 04d38a83be Updating the tests. Nicolas Patry 2024-02-08 14:59:35 +0000
  • cfacf91af8 fix logits processor OlivierDehaene 2024-02-08 12:49:24 +0100
  • 75b492d720 feat(server): add frequency penalty OlivierDehaene 2024-02-08 12:46:39 +0100
  • 2629193efa log message IlyasMoutawwakil 2024-02-05 09:26:47 +0100
  • 76834c9989 none g_idx IlyasMoutawwakil 2024-02-02 14:42:42 +0100
  • bbe5bedea5 pass g_idx instead of changing triton kernel IlyasMoutawwakil 2024-02-02 14:34:15 +0100
  • 646ab28285 typing IlyasMoutawwakil 2024-02-01 19:37:02 +0000
  • 8074c40473 adapt awq weights to exllama/gptq kernels IlyasMoutawwakil 2024-02-01 18:35:41 +0000
  • 212fdfffad revert changes IlyasMoutawwakil 2024-02-01 18:35:04 +0000
  • 3ceeb85842 fix missing g_idx and eventual overflow in triton kernel IlyasMoutawwakil 2024-02-01 13:30:43 +0000
  • 3963074ceb add triton fallback to awq IlyasMoutawwakil 2024-02-01 13:30:13 +0000
  • aa2014fc79 post process exllama model IlyasMoutawwakil 2024-02-01 12:48:17 +0100
  • 75086526d3 awq fallback to exllama IlyasMoutawwakil 2024-02-01 12:06:02 +0100
  • 461dd6f1c7 fix exllama overflows IlyasMoutawwakil 2024-02-01 12:05:36 +0100
  • 39af000cb9
    Update to peft 0.8.2 (#1537) Jason Stillerman 2024-02-08 06:44:04 -0500
  • bd405e035b
    Impl simple mamba model (#1480) drbh 2024-02-08 04:19:45 -0500
  • b99f784cb3 feat: conditionally include mamba drbh 2024-02-08 00:34:13 +0000
  • 1734540211
    feat: use existing add_generation_prompt variable from config in temp… (#1533) drbh 2024-02-07 03:35:53 -0500
  • 2c6ef7c93a fix: add missing accepted_ids to batch_top_tokens drbh 2024-02-07 03:57:35 +0000
  • 48624fee25 Merge branch 'impl-simple-mamba-model' of github.com:huggingface/text-generation-inference into impl-simple-mamba-model drbh 2024-02-07 03:24:32 +0000
  • deed8e8154 fix: adjust typos and docker build drbh 2024-02-07 03:24:28 +0000
  • 9146ba00a7
    Merge branch 'main' into impl-simple-mamba-model drbh 2024-02-06 18:38:20 -0500
  • 5b30a425f6 fix: update selective state Makefile drbh 2024-02-06 23:37:00 +0000
  • 50ca04b052 feat: update docker for mamba drbh 2024-02-06 21:15:16 +0000
  • 36a4853c4e fix: rename tests and snapshots drbh 2024-02-06 20:39:52 +0000
  • 5e102183d8 feat: prefer triton ops and batch conv drbh 2024-02-06 20:38:28 +0000
  • e10530d4f3 update to peft 0.8.2 update_peft Jason Stillerman 2024-02-06 14:41:15 -0500
  • 8319e854c8 Fix mamba load. Nicolas Patry 2024-02-06 18:57:24 +0000
  • 53b6b8bd08 feat: update and add tests for add_generation_prompt drbh 2024-02-06 11:43:43 -0500
  • ff0428a351 feat: defaults add_generation_prompt true drbh 2024-02-06 09:08:35 -0500
  • 3caa9b9cb7 feat: support batching drbh 2024-02-06 01:22:25 +0000
  • 63bc4c59d4 fix: improve step to use batch drbh 2024-02-06 00:17:04 +0000
  • a4f1916a56 feat: avoid triton selective_state_update drbh 2024-02-05 21:34:28 +0000
  • 76093c79ac feat: use existing add_generation_prompt variable from config in template drbh 2024-02-05 10:01:40 -0500
  • 29a8d5a3a1 Clippy. Nicolas Patry 2024-02-05 14:39:45 +0100
  • e1dc168188 Adding batch_dimension_flag (to be used for Neuron other forced padding targets). Nicolas Patry 2024-02-05 14:29:32 +0100
  • cda5751b41 log message IlyasMoutawwakil 2024-02-05 09:26:47 +0100
  • 0f124cbc52 fix: revise non batching tests drbh 2024-02-03 05:04:00 +0000
  • 3a42765cab feat: use cache when decoding drbh 2024-02-02 21:50:51 +0000
  • 0da00be52c
    feat: add ie update to message docs (#1523) drbh 2024-02-02 10:31:11 -0500
  • 58ddedec16
    Update docs/source/messages_api.md drbh 2024-02-02 09:58:29 -0500
  • af2c589cef none g_idx IlyasMoutawwakil 2024-02-02 14:42:42 +0100
  • 994ed8e10d pass g_idx instead of changing triton kernel IlyasMoutawwakil 2024-02-02 14:34:15 +0100