Commit Graph

  • f2ea123ccd Apply suggestions from code review Nicolas Patry 2024-04-22 16:43:28 +0200
  • 0e91e131b8 Update server/text_generation_server/models/custom_modeling/llava_next.py Nicolas Patry 2024-04-22 16:43:09 +0200
  • af08e359af Dummy changes. Nicolas Patry 2024-04-22 14:10:30 +0000
  • b564adc057 Delete server/text_generation_server/models/custom_modeling/idefics2_modeling.py Nicolas Patry 2024-04-22 16:00:36 +0200
  • f2d8c2e76f Fixing features for llava_next. Still issues with warmup and truncation atm. Nicolas Patry 2024-04-22 09:54:51 +0000
  • ae2b4e1c23 Operational. Nicolas Patry 2024-04-19 22:39:30 +0000
  • 613dc93617 Idefics2 in working state. Nicolas Patry 2024-04-19 16:30:16 +0000
  • f68ccfd023 Temporary dump of idefics2. Nicolas Patry 2024-04-17 16:18:08 +0000
  • 9f3ce55ce2 Idefics2. Nicolas Patry 2024-03-26 09:34:13 +0000
  • 9f44af470c Temporary implem of torch.compile on our stuff. Nicolas Patry 2024-03-21 18:56:40 +0000
  • c6fafff7f2 Update server/text_generation_server/utils/logits_process.py drbh 2024-04-22 13:42:15 -0400
  • d969151a1e fix: avoid frequency and repetition penalty on padding tokens drbh 2024-04-19 03:23:30 +0000
  • 87c4828c4e feat: allow null eos and bos tokens in config drbh 2024-04-22 13:58:46 -0400
  • ed72e92126
    fix typos in docs and add small clarifications (#1790) Moritz Laurer 2024-04-22 18:15:48 +0200
  • 26b3916612
    Make --cuda-graphs work as expected (bis) (#1768) fxmarty 2024-04-22 16:09:19 +0200
  • 22d0c9bba2 add guidance to landing page list of features moritzlaurer 2024-04-22 16:05:36 +0200
  • f04b1b343d
    Update docs/source/conceptual/guidance.md drbh 2024-04-22 10:01:56 -0400
  • 5a8cabf904 fix small typos in streaming docs moritzlaurer 2024-04-22 15:56:43 +0200
  • f9f23aaf2c add clarifying example for n-gram speculation moritzlaurer 2024-04-22 15:47:49 +0200
  • e041c78de4 remove chat template tokens for consistency with examples above (or add them to all examples) moritzlaurer 2024-04-22 15:12:17 +0200
  • bc4f42ad36 small typos moritzlaurer 2024-04-22 15:10:40 +0200
  • efd4b97d15 v1.4.0 (#1494) OlivierDehaene 2024-01-26 19:04:57 +0100
  • ac580f515b feat: add tokenizer-config-path to launcher args (#1495) drbh 2024-01-26 12:01:33 -0500
  • 4b376b30f1 GPTQ support on ROCm (#1489) fxmarty 2024-01-26 16:27:44 +0100
  • 5d663fb85d Update the docs to include newer models. (#1492) Nicolas Patry 2024-01-26 16:07:31 +0100
  • 5134d9ccc3 fix: launcher doc typos (#1462) Andrés Restrepo 2024-01-26 08:10:07 -0500
  • 9fd5f5150c Trying to fix that flaky test. (#1491) Nicolas Patry 2024-01-26 14:06:27 +0100
  • b064b33e8b Add sealion mpt support (#1477) Nicolas Patry 2024-01-26 14:05:02 +0100
  • ea2aa53805 Reinstate exl2 with tp (#1490) Nicolas Patry 2024-01-26 14:00:29 +0100
  • 82f20c4788 fix: launcher doc typos (#1473) Nicolas Patry 2024-01-26 10:41:58 +0100
  • 41fbf5c254 fix: show warning with tokenizer config parsing error (#1488) drbh 2024-01-26 04:41:39 -0500
  • a1124f7b8b Update the docs Nicolas Patry 2024-01-26 10:13:23 +0100
  • ac0be8a6a4 fix: read stderr in download (#1486) OlivierDehaene 2024-01-25 18:16:03 +0100
  • b2fc097b2b feat: adds phi model (#1442) drbh 2024-01-25 09:37:53 -0500
  • be9bfae18c Add a new /tokenize route to get the tokenized input (#1471) Nicolas Patry 2024-01-25 14:19:03 +0100
  • ae222cce6e Add messages api compatibility docs (#1478) drbh 2024-01-24 11:41:28 -0500
  • 2a3a9c526b Fixing non divisible embeddings. (#1476) Nicolas Patry 2024-01-24 13:08:41 +0100
  • 1b99d4c0b6 Disable decoder_input_details on OpenAI-compatible chat streaming, pass temp and top-k from API (#1470) Jacob Keisling 2024-01-23 08:55:05 -0600
  • 5836a1cc69 feat: conditionally toggle chat on invocations route (#1454) drbh 2024-01-22 10:29:01 -0500
  • 935ee00749 chore: bump rust version and annotate/fix all clippy warnings (#1455) drbh 2024-01-22 09:22:54 -0500
  • 77afb882dc feat: support raise_exception, bos and eos tokens (#1450) drbh 2024-01-18 06:31:56 -0500
  • 76b226b00c feat: supports openai chat completions API (#1427) drbh 2024-01-16 05:07:41 -0500
  • 12cfc7930b Return prompt vs generated tokens. (#1436) Nicolas Patry 2024-01-11 19:01:43 +0100
  • e930ad9cec Fix local load for Medusa (#1420) PYNing 2024-01-11 01:36:20 +0800
  • af63e3273f fix: follow base model for tokenizer in router (#1424) OlivierDehaene 2024-01-10 16:35:54 +0100
  • 92ddb41d95 Fix missing make target platform for local install: 'install-flash-attention-v2' (#1414) R. P. Ruiz 2024-01-09 10:19:31 -0500
  • 118344b99d fix: fix local loading for .bin models (#1419) OlivierDehaene 2024-01-09 15:21:00 +0100
  • fc9173aa59 docs: update required CUDA version to 12.2 OlivierDehaene 2024-01-09 14:28:55 +0100
  • 62646c2a54 v1.3.4 OlivierDehaene 2023-12-22 15:46:04 +0100
  • 8cc4306f72 Fix local load for peft (#1373) Nicolas Patry 2023-12-21 17:29:23 +0100
  • 7eeabb9cda feat: update exllamav2 kernels (#1370) OlivierDehaene 2023-12-21 17:25:22 +0100
  • 3e22ad985e docs: Change URL for Habana Gaudi support in doc (#1343) regisss 2023-12-21 11:05:35 +0100
  • be05972911 Peft safetensors. (#1364) Nicolas Patry 2023-12-20 15:37:14 +0100
  • 75f954df6c
    ensure aiohttp session exists Sabidao 2024-04-21 18:10:28 +0300
  • 3116fb5113
    Merge branch 'huggingface:main' into main Sabidao 2024-04-21 17:53:17 +0300
  • aef931ea5d fix fa2 triton kernel not working with MQA/GQA fxmarty 2024-04-20 21:16:11 +0000
  • 325f9774fe reenable _custom_C.LLMM1 as the culprit was FA2 triton fxmarty 2024-04-19 16:19:47 +0000
  • 81c27ba9c2 disable _custom_C.LLMM1 as it is broken for TP>=2 fxmarty 2024-04-19 15:59:31 +0000
  • 562cd4b06e fix fxmarty 2024-04-19 15:44:32 +0000
  • 6d59eb2e70 revert dev only changes fxmarty 2024-04-19 15:43:28 +0000
  • 885ce3354f User argument should be gospel and never ignored. fix_default_arg Nicolas Patry 2024-04-19 16:47:08 +0200
  • 26ba2d50a3 feat: add how it works section drbh 2024-04-19 14:42:34 +0000
  • 8eacae014f add missing files fxmarty 2024-04-19 13:46:54 +0000
  • b7299e1b7f fix: fix gpt-q with groupsize = -1 (#1358) OlivierDehaene 2023-12-18 16:07:05 +0100
  • ec5343ec5e cleaning fxmarty 2024-04-19 11:57:16 +0000
  • 5ff9e81952 fix: fix offline (#1341) (#1347) OlivierDehaene 2023-12-18 10:20:08 +0100
  • ecb0db45af fix: fix logic if sliding window key is not present in config (#1352) OlivierDehaene 2023-12-15 14:56:17 +0100
  • a95e6d603d feat: relax mistral requirements (#1351) OlivierDehaene 2023-12-15 12:52:24 +0100
  • 1b4c8b4b3e _custom_C.LLMM1 and HIP_FORCE_DEV_KERNARG=1 fxmarty 2024-04-19 11:50:01 +0000
  • f723e5ccb5 working fxmarty 2024-04-19 11:23:27 +0000
  • 3600fc9dbe v1.3.3 OlivierDehaene 2023-12-15 01:20:42 +0100
  • bb6200503c fix: max_past default value must be -1, not 0 (#1348) OlivierDehaene 2023-12-15 01:18:39 +0100
  • 214ec0eb49 fix: only keep stop sequence buffer if we have some OlivierDehaene 2023-12-14 17:04:58 +0100
  • 04dbf7a506 fix: slice stopping criteria buffer OlivierDehaene 2023-12-14 17:01:43 +0100
  • b3c2d7291e fix: fix quant linear autotune OlivierDehaene 2023-12-14 16:45:47 +0100
  • 28fcdcca6d fix: fix triton OutOfResources import OlivierDehaene 2023-12-14 16:04:26 +0100
  • 0ca83be883 WIP debug Triton FA2 fxmarty 2024-04-19 11:11:26 +0000
  • 5c9ef069ed feat: add more latency metrics in forward (#1346) OlivierDehaene 2023-12-14 15:59:38 +0100
  • 47e522a66a wip fa2 triton & fix cudagraph bug fxmarty 2024-04-19 10:11:39 +0000
  • 804068c207 now working fxmarty 2024-04-19 12:08:39 +0200
  • 24d43c487e fix typo fxmarty 2024-04-19 11:49:39 +0200
  • c974437ba7 fix: fix gpt-q params loading OlivierDehaene 2023-12-14 11:02:16 +0100
  • b503b3de60 tunableop in warmup fxmarty 2024-04-19 09:09:16 +0000
  • 2f88d8dfb3 fix: default max_new_tokens to 100 OlivierDehaene 2023-12-13 09:19:19 +0100
  • 3016e1595f at last working! fxmarty 2024-04-18 23:31:28 +0000
  • 2aa7e073bc
    Update guidance.md to reflect grammar support dr3s 2024-04-18 16:58:07 -0400
  • d769f56c0a
    Added reference to TPU support Brandon Royal 2024-04-18 15:29:50 -0400
  • e6259d9fc0 fix: reset grammar state when generation stops fix-grammar-cleanup-bug drbh 2024-04-18 17:05:52 +0000
  • 2d0a7173d4 v2.0.1 v2.0.1 OlivierDehaene 2024-04-18 17:20:36 +0200
  • f9ee2c41b9
    Upgrading all versions. (#1759) Nicolas Patry 2024-04-18 17:17:40 +0200
  • 90977e9291 export requirements, fix rocm and update openapi OlivierDehaene 2024-04-18 16:53:05 +0200
  • 05f8c85a8b v1.3.2 OlivierDehaene 2023-12-12 18:10:22 +0100
  • f9b58ac7a1 feat: add quant to mixtral (#1337) OlivierDehaene 2023-12-12 17:55:03 +0100
  • 09c556dbd7 v1.3.1 OlivierDehaene 2023-12-11 16:46:44 +0100
  • db5053fc86 v1.3.0 OlivierDehaene 2023-12-11 14:55:03 +0100
  • 79f268f95a chore: formatting OlivierDehaene 2023-12-11 14:49:52 +0100
  • 9aef902982 feat: mixtral (#1328) OlivierDehaene 2023-12-11 14:43:40 +0100
  • a7f52f3812 Speculative (#1308) Nicolas Patry 2023-12-11 12:46:30 +0100
  • 6e4d0feb47 Upgrading all versions. Nicolas Patry 2024-04-18 10:42:27 +0200
  • a41c1a6bc7 Add a stale bot. (#1313) Nicolas Patry 2023-12-05 14:42:55 +0100