Commit Graph

  • 27ecef5153 Using kernel like Makefile instead. Nicolas Patry 2023-09-25 08:56:13 +0000
  • 757cf1783d Declare torch as build dep. Nicolas Patry 2023-09-25 08:24:34 +0000
  • e08f3ac555 Add git to docker. Nicolas Patry 2023-09-25 08:17:49 +0000
  • 8ee9307618 Finishing nits + integration test Nicolas Patry 2023-09-25 10:07:45 +0200
  • c35f39cf83
    Add AWQ quantization inference support (#1019) dev Abhinav M Kulkarni 2023-09-25 13:28:02 +0530
  • 649d9754b1 fix discard_names in safetensors convertion zhangsibo1129 2023-09-25 10:43:28 +0800
  • de5098e013 Added AWQ in REDME Abhinav Kulkarni 2023-09-23 10:36:44 +0000
  • 054930fbbe Minor refactor Abhinav Kulkarni 2023-09-23 10:12:26 +0000
  • 5d0973f484 Refactored WQLinear Abhinav Kulkarni 2023-09-23 09:48:27 +0000
  • ba22ef54d4 pass max_total_tokens info through warmup, python could get max_total_tokens as truncate+max_new_tokens in warmup Wang, Yi A 2023-09-21 20:50:50 -0700
  • dac2348ab0 added my fork fix Merve Noyan 2023-09-21 12:57:15 +0200
  • a44a82b1b6 added my fork Merve Noyan 2023-09-21 12:56:40 +0200
  • 0c77c75ad0 added my fork Merve Noyan 2023-09-21 12:55:12 +0200
  • a9753a5b70 removed main.rs change condition to test Merve Noyan 2023-09-21 12:51:55 +0200
  • c6590fd1bb added git push back Merve Noyan 2023-09-21 12:47:57 +0200
  • 94065cd81a added print and removed launcher.md to test Merve Noyan 2023-09-21 12:46:10 +0200
  • e40b942389 added if merged Merve Noyan 2023-09-21 12:43:45 +0200
  • 7206fe3163 changed actions since it runs on repo level, added cargo build Merve Noyan 2023-09-21 12:23:52 +0200
  • 2bddc78dca fixes Merve Noyan 2023-09-21 12:17:47 +0200
  • 86736cb93a added codeblock, include removal of md file Merve Noyan 2023-09-21 12:06:15 +0200
  • 4cab978343 removed GH token env var Merve Noyan 2023-09-21 11:47:26 +0200
  • 1da6e241cf minor fix and removed credentials Merve Noyan 2023-09-21 11:46:06 +0200
  • c2eaa28e4b autodocs Merve Noyan 2023-09-21 11:41:57 +0200
  • e19d0e7867 Fixing t5 loading. Nicolas Patry 2023-09-21 08:29:48 +0200
  • 123749a3c9
    Fix missing arguments in Galactica's from_pb (#1022) Vincent Brouwers 2023-09-21 08:15:59 +0200
  • eeaa22ab04
    enable bfloat16 for cpu (#1034) Wang, Yi 2023-09-19 23:19:28 +0800
  • c44fce6c09 enable bfloat16 for cpu Wang, Yi A 2023-09-18 03:14:50 -0700
  • f85a6f853e
    Merge branch 'huggingface:main' into abhinavkulkarni/add-awq-support Abhinav M Kulkarni 2023-09-18 23:39:26 +0530
  • c58953398a
    Install curl within base image to be able to perform more advanced healthchecks Raphael 2023-09-15 17:41:34 +0200
  • 00359fcdc5 fix indent issue bangoz 2023-09-13 16:33:12 +0000
  • 2a16b4101f Fix top_n_tokens returning non-log probs for some models Vincent Brouwers 2023-09-14 08:49:35 +0000
  • 9b4545f279 Fix missing arguments in Galactica's from_pb Vincent Brouwers 2023-09-14 08:40:19 +0000
  • acb7e1d465 Added quantize_config.json support for AWQ Abhinav Kulkarni 2023-09-13 11:12:32 +0000
  • 00dede8a63 Added AWQ support for FlashLlama models Abhinav Kulkarni 2023-09-13 11:08:22 +0000
  • 8ce40f9d25
    fix code snippet Merve Noyan 2023-09-12 16:33:35 +0200
  • c8a01d7591
    Unsupported model serving docs (#906) Merve Noyan 2023-09-12 15:55:14 +0200
  • e9ae678699
    Quantization docs (#911) Merve Noyan 2023-09-12 15:52:46 +0200
  • 0966704dd6
    Update docs/source/basic_tutorials/non_core_models.md Merve Noyan 2023-09-12 15:52:30 +0200
  • 62b05f2ccf
    Update docs/source/supported_models.md Merve Noyan 2023-09-12 12:27:16 +0200
  • 6473cf852e
    Merge branch 'main' into quantization_docs Merve Noyan 2023-09-12 12:12:00 +0200
  • 1f69fb9ed4
    Tensor Parallelism conceptual guide (#886) Merve Noyan 2023-09-12 12:11:20 +0200
  • 6703fd3009
    Update installation.md albertodepaola 2023-09-11 16:29:03 -0300
  • 33958e0989 Start. speculative Nicolas Patry 2023-09-11 18:25:49 +0000
  • 4cce84301b
    fit for baichuan models (#981) xiaobin 2023-09-08 22:51:34 +0800
  • e349f57d10 Update solution to account for GPTQ. Nicolas Patry 2023-09-08 14:36:49 +0000
  • 0357de7022
    Merge branch 'main' into tp-docs Merve Noyan 2023-09-08 14:20:00 +0200
  • 9381797626
    Merge branch 'main' into quantization_docs Merve Noyan 2023-09-08 15:19:11 +0300
  • 30a93a0dec
    Paged Attention Conceptual Guide (#901) Merve Noyan 2023-09-08 15:18:42 +0300
  • 704cd18402
    Iterated on Pedro's comments Merve Noyan 2023-09-08 13:01:58 +0200
  • e82259106c
    Update docs/source/conceptual/quantization.md Merve Noyan 2023-09-08 12:55:45 +0200
  • 6cb066eb01 raise exception on invalid images Leo Tronchon 2023-09-08 12:35:13 +0200
  • 8acd649c56
    Merge branch 'main' into quantization_docs Merve Noyan 2023-09-07 19:47:53 +0300
  • 2faf396128
    Merge branch 'main' into paged-attention-docs Merve Noyan 2023-09-07 19:47:26 +0300
  • 53e89e7ae7
    Merge branch 'main' into tp-docs Merve Noyan 2023-09-07 19:46:49 +0300
  • 5a5b4ef954
    Clarified flag Merve Noyan 2023-09-07 18:42:33 +0200
  • 0a63e9ab68
    Fix __call__ vs forward. (#993) Nicolas Patry 2023-09-07 17:36:30 +0200
  • 7f48a61bce
    Update docs/source/conceptual/quantization.md Merve Noyan 2023-09-07 16:49:33 +0200
  • 47db26298a
    Update docs/source/conceptual/quantization.md Merve Noyan 2023-09-07 16:49:22 +0200
  • 12d9a67752
    Fix inline latex Merve Noyan 2023-09-07 16:46:05 +0200
  • 9a0a4d926c
    nit Merve Noyan 2023-09-07 17:24:14 +0300
  • af1ed38f39
    Safetensors conceptual guide (#905) Merve Noyan 2023-09-07 17:22:06 +0300
  • eb8f59083d
    Added note on weight-cache-override Merve Noyan 2023-09-07 16:20:56 +0200
  • 873573150f
    Update non_core_models.md Merve Noyan 2023-09-07 16:06:18 +0200
  • 07bc903d6e Fix __call__ vs forward. Nicolas Patry 2023-09-07 14:02:34 +0000
  • 4d12840986
    Update docs/source/basic_tutorials/non_core_models.md Merve Noyan 2023-09-07 15:25:54 +0200
  • 061b6a9c21
    Update docs/source/basic_tutorials/non_core_models.md Merve Noyan 2023-09-07 15:24:00 +0200
  • ecaa9d6f8e
    Update docs/source/conceptual/tensor_parallelism.md Merve Noyan 2023-09-07 14:55:43 +0200
  • b23ad5d1e4
    Update docs/source/conceptual/tensor_parallelism.md Merve Noyan 2023-09-07 14:54:03 +0200
  • 099291a061
    Update docs/source/conceptual/tensor_parallelism.md Merve Noyan 2023-09-07 14:53:30 +0200
  • 0ef535e77e
    Merge branch 'main' into safetensors_docs Merve Noyan 2023-09-07 15:47:55 +0300
  • 73d4f92e0e
    Merge branch 'main' into paged-attention-docs Merve Noyan 2023-09-07 15:47:15 +0300
  • 0fcd2b4727
    Update docs/source/conceptual/paged_attention.md Merve Noyan 2023-09-07 14:46:49 +0200
  • 9973f4041c
    Update docs/source/conceptual/paged_attention.md Merve Noyan 2023-09-07 14:46:39 +0200
  • 5d27a467eb
    Update docs/source/conceptual/paged_attention.md Merve Noyan 2023-09-07 14:46:29 +0200
  • 41cd2e350c
    Update docs/source/conceptual/paged_attention.md Merve Noyan 2023-09-07 14:46:21 +0200
  • 90930a537c
    Update docs/source/_toctree.yml Merve Noyan 2023-09-07 14:46:09 +0200
  • 5ec7b1a2af
    Update docs/source/conceptual/paged_attention.md Merve Noyan 2023-09-07 14:46:03 +0200
  • b03d2621a7
    add transformers gptq support (#963) Florian Zimmermeister 2023-09-07 10:19:42 +0200
  • 935a77fb74
    Fix exllama wronfully loading (#990) Maxime Laboissonnière 2023-09-07 03:17:22 -0400
  • 9f9cb924e0 Merge branch 'fix_exllama_wronfully_loading' of github.com:maximelaboisson/text-generation-inference into fix_exllama_wronfully_loading Maxime Laboissonniere 2023-09-06 20:02:32 -0400
  • afe9c07476 fixing condition Maxime Laboissonniere 2023-09-06 20:02:23 -0400
  • 06a3d19142 fixing condition Maxime Laboissonniere 2023-09-06 19:41:42 -0400
  • 7c8f0a0546
    Merge branch 'main' into remove_readme Omar Sanseviero 2023-09-06 22:22:00 +0200
  • a9fdfb2464
    docs: Remove redundant content from stream guide (#884) Omar Sanseviero 2023-09-06 18:42:42 +0200
  • 433cc0f4d9
    Update README.md Omar Sanseviero 2023-09-06 16:56:40 +0200
  • 4a21912edf
    Update README.md Omar Sanseviero 2023-09-06 16:48:56 +0200
  • 915f2e909c
    Update docs/source/conceptual/streaming.md Omar Sanseviero 2023-09-06 16:43:21 +0200
  • f260eb72f9
    docs: Flash Attention Conceptual Guide (#892) Merve Noyan 2023-09-06 16:36:49 +0300
  • 059bb5cf83
    chore: sync text-generation version from 0.3.0 to 0.6.0 with pyproject.toml (#950) 王佳欣 2023-09-06 21:20:32 +0800
  • 211e7b7e35
    Disabling exllama on old compute. (#986) Nicolas Patry 2023-09-06 15:01:00 +0200
  • 3ed4c0f33f
    docs: typo in streaming.js (#971) Julien Bouquillon 2023-09-06 14:57:59 +0200
  • 14bbd311c1 Dummy workaround for CPU. Nicolas Patry 2023-09-06 14:35:02 +0200
  • 1987d37603 Disabling exllama on old compute. Nicolas Patry 2023-09-06 14:20:03 +0200
  • c8bbbd8129
    chore(client): Support Pydantic 2 (#900) Jelle Zijlstra 2023-09-06 05:12:08 -0700
  • 47fbf4495e
    Merge branch 'huggingface:main' into main Marcus Dunn 2023-09-05 13:24:20 -0700
  • 2a1f306e26 fit for baichuan models xiaoyuze 2023-09-05 15:57:32 +0800
  • 033230ae66
    Backport https://github.com/vllm-project/vllm/pull/936 (#977) Nicolas Patry 2023-09-04 15:00:19 +0200
  • 9aaa184675 Going bacjk on Olivier fork. Nicolas Patry 2023-09-04 14:15:39 +0200
  • e181a3a761 Backport https://github.com/vllm-project/vllm/pull/936 Nicolas Patry 2023-09-04 12:29:39 +0200
  • 1700d11905 updated rsnm2 2023-09-01 18:29:36 +0000