Commit Graph

  • b404b1dffe
    Attempt #28 Nicolas Patry 2024-09-10 15:41:10 +0200
  • 393e5f4b30
    Debugging CIs is fun. Nicolas Patry 2024-09-10 15:35:09 +0200
  • b9bdd5ca90
    Get me a debug branch Nicolas Patry 2024-09-10 15:13:26 +0200
  • 99fd425ed2
    Mounting on the job. Nicolas Patry 2024-09-10 14:47:56 +0200
  • ed51bb94ce
    Add comment for why slot 0 is OK. Nicolas Patry 2024-09-10 11:46:45 +0200
  • 2881edb3d7
    Remove 1 log and put back the other. Nicolas Patry 2024-09-10 10:59:58 +0200
  • 29d3601457
    Minor fixup Nicolas Patry 2024-09-10 10:52:02 +0200
  • d57b7091aa
    Are we done yet ? Nicolas Patry 2024-09-10 10:24:56 +0200
  • 8857b68cfc fix ci failure Wang, Yi A 2024-09-09 23:54:55 -0700
  • 8c3859d153 Merge branch 'main' into gpt_awq_4 Wang, Yi A 2024-09-09 23:19:08 -0700
  • 3f2dc61500 fix style Mohit Sharma 2024-09-09 10:13:59 +0000
  • e128bc540b
    Upgrading some stuff. Nicolas Patry 2024-09-09 11:37:06 +0200
  • af7d9f7b7d
    Upgraded flashinfer. Nicolas Patry 2024-09-09 10:03:43 +0200
  • 528cdb51f8 bump ratatui all the way with options Alex Strick van Linschoten 2024-09-08 16:37:57 +0200
  • 8bcd432892
    use ratatui not archived tui Alex Strick van Linschoten 2024-09-08 16:13:01 +0200
  • 0deebe7012 Update README with Docker image v2.0.5 regisss 2024-09-07 17:56:52 +0000
  • bf9865e956
    Upgrade to Optimum Habana v1.13.2 (#222) regisss 2024-09-07 19:52:59 +0200
  • a4f39a1cae
    Update README.md with changes related to LLava-next multi card support (#221) Thanaji Rao Thakkalapelli 2024-09-07 08:46:21 -0700
  • eabbbbda23
    Add Directory Check to Prevent Redundant Cloning in Build Process (#2486) nix_test2 Vallepu Vamsi Krishna 2024-09-07 16:49:43 +0530
  • efd7cec4a3
    Adding numpy to diff. Nicolas Patry 2024-09-07 11:39:18 +0200
  • 1d0847a90e
    Revert the max prefix hit. Nicolas Patry 2024-09-07 01:19:16 +0200
  • c67bec168e
    Remove some comments. Nicolas Patry 2024-09-07 00:53:12 +0200
  • 37790de5ca
    Is it really flashinfer version ? Nicolas Patry 2024-09-07 00:34:31 +0200
  • ad7c620f0f
    Llava-next: Added flash_attention_recompute option (#220) Thanaji Rao Thakkalapelli 2024-09-06 13:20:07 -0700
  • 2299b739fe
    Only Apply the TP in language_model (#219) yuanwu2017 2024-09-07 04:19:24 +0800
  • 69c168d1e0
    Fix parsing Nicolas Patry 2024-09-06 18:50:52 +0200
  • 785c6e4893
    Fixed the radix tree. Nicolas Patry 2024-09-06 17:31:32 +0200
  • f952024533
    Remove other tensor creation. Nicolas Patry 2024-09-06 16:59:11 +0200
  • d45408e935
    [WIP] tmp dump of integration load tests. Nicolas Patry 2024-09-05 14:23:06 +0200
  • 3669d078e0
    Adding prefix test. Nicolas Patry 2024-09-05 10:33:50 +0200
  • c1fe28d694
    Fixing more correctly the invalid drop of the batch. (#2498) Nicolas Patry 2024-09-06 17:35:49 +0200
  • aaea212d0f
    Add links to Adyen blogpost (#2500) Martin Iglesias Goyanes 2024-09-06 17:00:54 +0200
  • 5d9d3717b2
    Update _toctree.yml Martin Iglesias Goyanes 2024-09-06 15:47:02 +0200
  • 0dfb012ffe
    Update external.md Martin Iglesias Goyanes 2024-09-06 15:46:40 +0200
  • 4f41db604a
    Adding to toctree. Nicolas Patry 2024-09-06 15:33:37 +0200
  • 88e2997b9c style Mohit Sharma 2024-09-06 12:23:18 +0000
  • 7dee5e359e Add links to Adyen blogpost martini 2024-09-06 14:06:08 +0200
  • 778883655a
    Fixing more correctly the invalid drop of the batch. Nicolas Patry 2024-09-06 11:41:47 +0200
  • a3c9c62dc0
    hotfix: add syrupy to the right subproject (#2499) Daniël de Kok 2024-09-06 12:47:06 +0200
  • 379472c4c2
    radix trie: add assertions (#2491) Daniël de Kok 2024-09-06 11:55:23 +0200
  • b9b5df7492 hotfix: add syrupy to the right subproject Daniël de Kok 2024-09-06 09:53:06 +0000
  • 2eb57a15ec
    Fix incompatibility with latest syrupy and update in Poetry (#2497) Daniël de Kok 2024-09-06 11:00:52 +0200
  • 0424e27f65
    nix: add pyright/ruff for proper LSP in the impure devshell (#2496) Daniël de Kok 2024-09-06 10:19:04 +0200
  • 8d62b831c1 Fix incompatibility with latest syrupy and update in Poetry Daniël de Kok 2024-09-06 08:16:39 +0000
  • e79f627d8f nix: add pyright/ruff for proper LSP in the impure devshell Daniël de Kok 2024-09-06 08:07:41 +0000
  • 5cd8025f18
    hotfix: fix regression of attention api change in intel platform (#2439) Wang, Yi 2024-09-05 23:41:39 +0800
  • e279b38aca
    Add two handy gitignores for Nix environments (#2484) Daniël de Kok 2024-09-05 17:06:54 +0200
  • 8b96a18265
    Adding links to Adyen blogpost. (#2492) Nicolas Patry 2024-09-05 16:11:52 +0200
  • e98b726a37
    Adding links to Adyen blogpost. Nicolas Patry 2024-09-05 16:06:43 +0200
  • deec30f893
    hotfix: avoid non-prefilled block use when using prefix caching (#2489) Daniël de Kok 2024-09-05 15:09:29 +0200
  • 082fe5eb67 radix trie: add assertions Daniël de Kok 2024-09-05 12:11:18 +0000
  • 02f0083c7a hotfix: avoid non-prefilled block use when using prefix caching Daniël de Kok 2024-09-05 12:04:29 +0000
  • 0c7fddcfef
    Update Makefile-fbgemm Vallepu Vamsi Krishna 2024-09-04 12:01:56 +0530
  • ff0505e7f9 added custom PA Mohit Sharma 2024-09-04 05:46:28 +0000
  • b5e25e30ca Add two handy gitignores for Nix environments Daniël de Kok 2024-09-03 12:59:20 +0000
  • 69dd51069f unique hash for each image token feature/vlm-prefix-caching Daniël de Kok 2024-09-03 12:56:02 +0000
  • 8c74ee4498 Simplify image token lookup Daniël de Kok 2024-09-03 11:46:23 +0000
  • bac2cf7655 Idefics Daniël de Kok 2024-09-03 11:36:11 +0000
  • 029d2719c1 vlm fixes Daniël de Kok 2024-09-03 10:30:15 +0000
  • 6cb42f49ae
    feat: support lora revisions and qkv_proj weights (#2482) drbh 2024-09-02 13:09:06 -0400
  • e6c524c66b WIP Daniël de Kok 2024-09-02 14:48:16 +0000
  • 47d7e34458
    fix: enable chat requests in vertex endpoint (#2481) drbh 2024-09-02 10:00:52 -0400
  • 85b5ce6539 fix: add qkv_proj weights to weight test drbh 2024-09-02 13:57:37 +0000
  • de2cdeca53
    nix: add punica-kernels (#2477) Daniël de Kok 2024-09-02 11:31:36 +0200
  • a258e8f66a
    fix: Fix PR comments feat/add-load-test Hugo Larcher 2024-08-30 15:41:43 +0200
  • e4ab855480
    nix: improve impure devshell (#2478) Daniël de Kok 2024-09-02 09:27:10 +0200
  • 8666df68d6 feat: support lora revisions and qkv_proj weights drbh 2024-08-30 22:24:01 +0000
  • ed8c7726ba feat: avoid unwrap and pre allocate future vec drbh 2024-08-30 16:47:24 +0000
  • b5dd58f73b fix: enable chat requests in vertex endpoint drbh 2024-08-30 16:19:30 +0000
  • 345d47362f
    Merge branch 'main' into feat/add-load-test Hugo Larcher 2024-08-30 15:31:55 +0200
  • 4e78b00677
    fix: Fix PR comments (remove Jinja) Hugo Larcher 2024-08-30 15:29:30 +0200
  • 7499629a97 nix: improve impure devshell Daniël de Kok 2024-08-30 10:07:17 +0000
  • 1d9404d008 nix: add punica-kernels Daniël de Kok 2024-08-30 09:56:36 +0000
  • bb803bb3bd fix regression caused by attention api change. ipex.varlen_attention does not support paged-cache format kv input now. Wang, Yi A 2024-08-29 19:02:06 -0700
  • d9fbbaafb0
    Tied embeddings in MLP speculator. (#2473) Nicolas Patry 2024-08-29 17:44:54 +0200
  • 9883f3b40e
    update doc with intel cpu part (#2420) Wang, Yi 2024-08-29 23:42:02 +0800
  • 765129a345
    Apply suggestions from code review Nicolas Patry 2024-08-29 17:41:44 +0200
  • 9f036684ef
    Adding scaling support + optimize some ops. Nicolas Patry 2024-08-29 17:31:41 +0200
  • 09a1de5cd1
    Fixing the scale_weight when users decide to not use the speculation as much as defined in the config. Nicolas Patry 2024-08-29 12:33:45 +0200
  • 62a8343153
    Tied embeddings in MLP speculator. Nicolas Patry 2024-08-29 12:30:26 +0200
  • d5202c46f7
    feat: add /v1/models endpoint (#2433) drbh 2024-08-29 10:32:38 -0400
  • e415b690a6
    Lots of improvements (Still 2 allocators) (#2449) Nicolas Patry 2024-08-29 16:29:01 +0200
  • 4e821c003a
    nix: build Torch against MKL and various other improvements (#2469) Daniël de Kok 2024-08-29 16:25:25 +0200
  • b4126793a5
    Fmt. Nicolas Patry 2024-08-29 12:37:48 +0200
  • 0c00b9495d
    Revert the Cohere tokenizer change (for now using a revision instead). Nicolas Patry 2024-08-29 11:35:18 +0200
  • 9bfdac237d
    Fix disabling prefix caching - Fix windowing checks. Nicolas Patry 2024-08-29 11:34:13 +0200
  • 3d5f10701d
    Update _toctree.yml Omar Sanseviero 2024-08-29 12:32:44 +0200
  • 5838f2139f
    Tied embeddings in MLP speculator. upgrade_mlp_speculator Nicolas Patry 2024-08-29 12:30:26 +0200
  • d77f5f2eff
    Update server/text_generation_server/layers/attention/common.py Nicolas Patry 2024-08-29 11:59:31 +0200
  • 4b375004c9
    Apply suggestions from code review Nicolas Patry 2024-08-29 11:58:57 +0200
  • 5e2932552c
    Revert the Cohere tokenizer change (for now using a revision instead). Nicolas Patry 2024-08-29 11:35:18 +0200
  • fc7ea202c2
    Fix disabling prefix caching - Fix windowing checks. Nicolas Patry 2024-08-29 11:34:13 +0200
  • e1e8b6d9c0
    Small improvements for docs osanseviero 2024-08-29 11:24:00 +0200
  • bef2f6bdaa
    Fixing the free algorithm to handle times where the common prefix is smaller. Nicolas Patry 2024-08-29 09:17:00 +0200
  • 9c839ca5df
    Adding error message when assert is violated. Nicolas Patry 2024-08-28 21:22:36 +0200
  • f4f75b2644 nix: build Torch against MKL and various other improvements Daniël de Kok 2024-08-28 19:16:58 +0000
  • 8f99f165ce
    fix: improve regex expression (#2468) drbh 2024-08-28 13:44:44 -0400
  • e7e036389e
    Revert the integrationt tests change (seem linked to head_size modification). Nicolas Patry 2024-08-28 19:38:51 +0200
  • 3d6aefe201 fix: improve regex expression drbh 2024-08-28 17:33:32 +0000
  • 73d93bdd93
    Downgrade sympy to match synapaseAI 1.18 base image (#215) Thanaji Rao Thakkalapelli 2024-08-28 08:45:44 -0700