Commit Graph

  • 93e7ba54c0 fix tests OlivierDehaene 2024-04-10 17:20:07 +0200
  • 07a3050b20 fixed OlivierDehaene 2024-04-10 16:47:41 +0200
  • ae6215fcea
    Enable server UT: test_causal_lm.py::test_batch_from_pb (#121) Jacek Czaja 2024-04-10 16:33:56 +0200
  • 2e7f6e8012 freaking rotary OlivierDehaene 2024-04-10 15:18:51 +0200
  • 424e1b41a2 update vllm version OlivierDehaene 2024-04-10 11:20:53 +0200
  • 87505bf28a fix OlivierDehaene 2024-04-10 08:45:56 +0200
  • 4634b00c2a
    Adding Llava-Next (Llava 1.6) with full support. (#1709) Nicolas Patry 2024-04-09 21:32:00 +0200
  • f4f1e206db remove imports OlivierDehaene 2024-04-09 19:32:26 +0200
  • 26da6bfb2d fix mistral OlivierDehaene 2024-04-09 19:31:16 +0200
  • d4da0d4d97 use custom vllm with kv_head_mapping OlivierDehaene 2024-04-09 19:04:44 +0200
  • 0604c5cb83 add py-cpuinfo OlivierDehaene 2024-04-08 14:56:51 +0200
  • 946bf44242 fix cohere OlivierDehaene 2024-04-05 18:42:33 +0200
  • 91d76a65f5 remove log_level from python shard OlivierDehaene 2024-04-05 17:29:38 +0200
  • 0c88cb6327 remove log_level from python shard OlivierDehaene 2024-04-05 16:50:42 +0200
  • d7497f55cf update dockerfile OlivierDehaene 2024-04-05 15:36:27 +0200
  • 847df6099a update dockerfile OlivierDehaene 2024-04-05 11:41:56 +0200
  • 58a7719e02 fix OlivierDehaene 2024-04-04 19:11:50 +0200
  • 4a02d3505f add contiguous OlivierDehaene 2024-04-04 18:48:58 +0200
  • 5088005908 fix: fix CohereForAI/c4ai-command-r-plus OlivierDehaene 2024-04-04 18:46:51 +0200
  • 4217ddb842 Move import up. Nicolas Patry 2024-04-09 17:19:26 +0000
  • 8c114e5fc4 Fixing select_best_resolution. Nicolas Patry 2024-04-09 17:16:15 +0000
  • 61821f410a Update mt0 (not more truncating). Nicolas Patry 2024-04-09 13:11:25 +0000
  • 30cc78773e
    Skip server tests of not enabled models (#125) Karol Damaszke 2024-04-09 14:15:41 +0200
  • 2283562bfc Created all the logic server side (with image download on the fly too). Nicolas Patry 2024-04-09 11:26:30 +0000
  • c6739526c6
    Fix test_watermark (#124) Karol Damaszke 2024-04-09 11:29:21 +0200
  • 106d8ee818
    Automatic quantization config. (#1719) Nicolas Patry 2024-04-09 10:27:57 +0200
  • 757c12dbac
    Fix test_pass_through_tokenizer (#117) Sylwester Fraczek 2024-04-09 09:30:47 +0200
  • fd536f2017 Automatic quantization config. Nicolas Patry 2024-04-09 05:40:52 +0000
  • 215030ac88 Tmp dump (sending real image for real memory offset to be computed. Nicolas Patry 2024-04-09 05:15:09 +0000
  • d0bc603fe6 feat: explore compiled MLP bench op-compilation-benchmarking drbh 2024-04-09 02:36:09 +0000
  • 2762e6883e fix: include fsm_grammar_states in FlashMistralBatch from_pb fix-grammar-fsm-batching drbh 2024-04-08 17:23:46 +0000
  • d957e32601
    Add Habana copyright header (#122) Karol Damaszke 2024-04-08 18:06:21 +0200
  • 204d2d8a2f
    docker image: text-generation-launcher wrapper as entrypoint Raphael Glon 2024-04-08 16:48:22 +0200
  • 274b68ad7d More GPUs for more VRAM. Nicolas Patry 2024-04-08 14:52:34 +0000
  • b65beb43d3
    Revert "Regenerate ld.so.cache (#1708)" Raphael Glon 2024-04-08 16:45:34 +0200
  • 635701ca29
    feat: add async context manager for AsyncClient Sabidao 2024-04-08 17:16:19 +0300
  • a7ac9877c2 Force the actual upgrade. Nicolas Patry 2024-04-08 14:13:26 +0000
  • 39620ce29f Fixed load test. Bad sanitation on the router meant CUDA OOM. Nicolas Patry 2024-04-08 14:08:02 +0000
  • ff42d33e99
    Revert license to Apache 2.0 (#1714) OlivierDehaene 2024-04-08 15:06:16 +0200
  • 314f1363a4
    Empty commit Julien Chaumond 2024-04-08 15:01:01 +0200
  • b00cdc5140
    Revert "chore: update license to HFOIL (#725)" OlivierDehaene 2024-04-08 14:59:16 +0200
  • 99771cfad5 Upgrade tests (still missing load tests for some reason). Nicolas Patry 2024-04-08 09:56:37 +0000
  • 0bd7ef5d7f
    Update libraries abhishek thakur 2024-04-08 11:39:34 +0200
  • 53c2c3dbc7
    Regenerate ld.so.cache (#1708) oOraph 2024-04-08 08:52:10 +0200
  • ccbfc05db5 Fixing integration tests ? (Failures locally). Nicolas Patry 2024-04-05 18:06:04 +0000
  • 8dca3b04f8
    Force weights_only (before fully breaking pickle files anyway). (#1710) Nicolas Patry 2024-04-05 19:23:57 +0200
  • 7852a85b57 Adding docs. Nicolas Patry 2024-04-05 16:06:01 +0000
  • 6c350f2f75 Working for TP, Llama + Mistral Nicolas Patry 2024-04-05 15:27:29 +0000
  • df4c700828 Tmp dump (running on images hardcoded size.) Nicolas Patry 2024-04-04 21:42:57 +0000
  • 5f4b395480 More work on the CLIP Side. Nicolas Patry 2024-04-04 18:08:38 +0000
  • b8be0d1ae7 Update by abstracting away text model. Nicolas Patry 2024-04-03 16:41:01 +0000
  • b68fc4deb1 Llava next dump. Nicolas Patry 2024-04-02 09:40:27 +0000
  • 422f23be74 Force weights_only (before fully breaking pickle files anyway). Nicolas Patry 2024-04-05 16:17:16 +0000
  • 4fb19f25be
    Regenerate ld.so.cache Raphael Glon 2024-04-05 17:56:50 +0200
  • f9958ee191
    Fixing cohere tokenizer. (#1697) Nicolas Patry 2024-04-05 16:44:19 +0200
  • 5062fda4ff
    Push users to streaming in the readme. (#1698) Nicolas Patry 2024-04-05 16:44:10 +0200
  • c7e570e59d
    Pickle conversion now requires --trust-remote-code. (#1704) Nicolas Patry 2024-04-05 13:32:53 +0200
  • b0f460a74c Make warning visible in the logs. Nicolas Patry 2024-04-05 11:31:51 +0000
  • e2c870c216 Dummy modification. Nicolas Patry 2024-04-05 08:32:25 +0000
  • 96846f633a Soft deprecation with clear text explaining the rationale. Nicolas Patry 2024-04-05 08:21:23 +0000
  • 99874eae74
    Add cuda graphs sizes and make it default. (#1703) Nicolas Patry 2024-04-04 23:01:56 +0200
  • ac118a5ad0 Pickle conversion now requires --trust-remote-code. Nicolas Patry 2024-04-04 13:16:32 +0000
  • d67633a0c8 Fix disabling. Nicolas Patry 2024-04-04 13:01:27 +0000
  • 6951962ffd Clarify disabling. Nicolas Patry 2024-04-04 12:59:29 +0000
  • edcbc0890c Move to cuda graphs by default (with possibility to choose graph sizes). Nicolas Patry 2024-04-04 12:46:28 +0000
  • 06227f7b5e
    Fix router tests (#119) Karol Damaszke 2024-04-04 11:10:11 +0200
  • e210e15e27
    Update Cargo.lock file (#118) Karol Damaszke 2024-04-03 17:55:54 +0200
  • 638685ea94 Push users to streaming in the readme. Nicolas Patry 2024-04-02 19:27:17 +0000
  • 9b86418e21 Fixing cohere tokenizer. Nicolas Patry 2024-04-02 19:25:01 +0000
  • b0de25a285
    Don't set rope_scaling for unsupported models (#115) Karol Damaszke 2024-04-02 12:12:02 +0200
  • 3e28d7aa42
    Align the default value with server's (#111) yuanwu2017 2024-04-01 18:44:20 +0800
  • 4ee0a0c401
    v1.4.5 (#1686) v1.4.5 OlivierDehaene 2024-03-29 19:17:24 +0100
  • 93fd4fd2fe v1.4.5 OlivierDehaene 2024-03-29 19:07:35 +0100
  • f04255c694
    feat: Add dbrx support (#1685) OlivierDehaene 2024-03-29 18:49:36 +0100
  • 275a61aae6 use GPT2TokenizerFast by default OlivierDehaene 2024-03-29 18:46:28 +0100
  • dcfefc425a feat(models): Add DBRX OlivierDehaene 2024-03-29 18:41:35 +0100
  • 7342baa2eb
    Add support for rope_scaling and remove is_optimized_for_gaudi (#112) Karol Damaszke 2024-03-29 15:07:32 +0100
  • 2c83d09d3b wip OlivierDehaene 2024-03-28 18:28:09 +0100
  • 762dbf3f19
    fix: handle batches with and without grammars (#1676) drbh 2024-03-28 12:02:01 -0400
  • 818aee37e5
    fix: adjust logprob response logic (#1682) drbh 2024-03-28 12:01:46 -0400
  • 01ebb77d12 fix: adjust logprob response logic drbh 2024-03-28 00:00:55 +0000
  • bf5263b88b
    Disable watermark with FP8 quantization (#114) Karol Damaszke 2024-03-27 13:32:20 +0100
  • 56f00a552b
    Adjust warmup to all possible bucket sizes and decode batch size = 1 (#113) jkaniecki 2024-03-27 11:59:51 +0100
  • 9796b0e10d
    Add simple continuous batching benchmark (#108) Karol Damaszke 2024-03-26 09:17:55 +0100
  • 0cd04fe4f7 fix: handle batches with and without grammars drbh 2024-03-25 23:18:50 +0000
  • 7f58680999
    Add docker pull command in README (#110) regisss 2024-03-25 15:44:54 +0100
  • 2b1581edac
    Warmup greedy search in next token chooser (#109) jkaniecki 2024-03-22 23:43:20 +0100
  • 6c4496a1a3
    v1.4.4 (#1668) v1.4.4 OlivierDehaene 2024-03-22 18:44:05 +0100
  • 57915957ab v1.4.4 OlivierDehaene 2024-03-22 18:05:21 +0100
  • 1e9bcd9dd8
    feat: cohere (#1660) OlivierDehaene 2024-03-22 17:59:25 +0100
  • f39cb899d9 remove torch from requirements OlivierDehaene 2024-03-22 17:21:34 +0100
  • f171bdc823
    Inline images for multimodal models. (#1666) Nicolas Patry 2024-03-22 17:14:54 +0100
  • 66914f7b19
    fix: LlamaTokenizerFast to AutoTokenizer at flash_mistral.py (#1637) SeongBeomLEE 2024-03-23 01:13:13 +0900
  • 08e9181418
    feat: update client to 0.7 (#1667) OlivierDehaene 2024-03-22 17:10:56 +0100
  • bd73076761 feat: update client to 0.7 OlivierDehaene 2024-03-22 17:08:58 +0100
  • b775027422 update requirements OlivierDehaene 2024-03-22 16:55:35 +0100
  • cfc89bb396 faster OlivierDehaene 2024-03-21 09:49:58 +0100
  • 56296cc43c feat: add cohere OlivierDehaene 2024-03-14 11:21:06 +0000
  • e2f9856a88 Inline images for multimodal models. Nicolas Patry 2024-03-22 09:01:49 +0000
  • deb440b3a2
    Repair idefics integration tests. (#1663) Nicolas Patry 2024-03-21 22:21:03 +0100