Commit Graph

  • 97d9ff3a71 Trying back to but EXl2 + TP>1, Nicolas Patry 2024-01-26 10:27:17 +0000
  • 16958fe312
    fix: launcher doc typos (#1473) Nicolas Patry 2024-01-26 10:41:58 +0100
  • 13dd8e2361
    fix: show warning with tokenizer config parsing error (#1488) drbh 2024-01-26 04:41:39 -0500
  • 45978034c9 Pre-emptive on sealion. update_docs2 Nicolas Patry 2024-01-26 10:15:31 +0100
  • 17b7b75e65 Update the docs Nicolas Patry 2024-01-26 10:13:23 +0100
  • a32df51e3a
    Introduce basic helm chart Wilfried Roset 2023-11-09 21:19:45 +0100
  • 1c32d53fc3 feat: prefer custom model and produce correct output drbh 2024-01-25 17:07:37 -0500
  • b2d7448a39 fix: show warning with tokenizer config parsing error drbh 2024-01-25 17:40:09 +0000
  • 9c320e260b
    fix: read stderr in download (#1486) OlivierDehaene 2024-01-25 18:16:03 +0100
  • 592cb0a10d read stderr without log lines OlivierDehaene 2024-01-25 16:05:11 +0100
  • 7e2a7433d3
    feat: adds phi model (#1442) drbh 2024-01-25 09:37:53 -0500
  • 29cad22615 fix: read stderr in download OlivierDehaene 2024-01-25 15:32:43 +0100
  • 86c8335f1b
    Add a new /tokenize route to get the tokenized input (#1471) Nicolas Patry 2024-01-25 14:19:03 +0100
  • 46ca08a831 Fmt. Nicolas Patry 2024-01-25 10:33:51 +0000
  • 211b9681e6 Original truncate behavior. Nicolas Patry 2024-01-25 10:11:23 +0000
  • 168ec6b145 Don't actually modify the inputs. Nicolas Patry 2024-01-25 09:42:05 +0000
  • 35939a28c7 feat: mvp single inference and explore integration drbh 2024-01-24 20:55:12 -0500
  • dd39877ee3 fix: bump transformers version to support phi fallback drbh 2024-01-24 13:50:05 -0500
  • 6134f0108d fix: cleanup config, remove unused values and fix non flash init drbh 2024-01-24 17:38:17 +0000
  • 9bcd21a0b0 fix: adjust model config vars and other refactors drbh 2024-01-24 12:23:04 -0500
  • 7872b8c55b
    Add messages api compatibility docs (#1478) drbh 2024-01-24 11:41:28 -0500
  • 6b575730d8 fix: adjust file name and add to toc tree drbh 2024-01-24 11:05:46 -0500
  • 635a5b42ea feat: align docs with demo code and rename env var drbh 2024-01-24 10:52:40 -0500
  • bfe062ee99 Go back to the default of adding special tokens in validation. Nicolas Patry 2024-01-24 12:31:12 +0000
  • 7e542d4d05
    Fixing non divisible embeddings. (#1476) Nicolas Patry 2024-01-24 13:08:41 +0100
  • 58ea6bf897 Update idefics. Nicolas Patry 2024-01-24 11:15:05 +0000
  • 3b560f4ea8 Fixing non divisible embeddings. Nicolas Patry 2024-01-24 10:22:05 +0000
  • 99392376e6 fix: add inline comments to highlight differences with llama drbh 2024-01-24 00:54:47 +0000
  • c7ad2b61a1 feat: add integration tests and snapshots for phi drbh 2024-01-24 00:36:24 +0000
  • 1f7042d165 fix error if top_n_tokens is 0 or null gduhamel 2024-01-23 21:05:13 +0100
  • e4fa84ba26 fix: prefer message api naming drbh 2024-01-23 17:20:28 +0000
  • 8f4dc804c5 feat: draft open ai compatibility docs drbh 2024-01-23 17:14:03 +0000
  • b819e24960 Fix typos in launcher help. Nicolas Patry 2024-01-23 15:31:13 +0100
  • 18f13a1b5f feat: avoid copy for partial rotary embeddings drbh 2024-01-23 15:42:10 +0000
  • 82f87ada6f
    Disable decoder_input_details on OpenAI-compatible chat streaming, pass temp and top-k from API (#1470) Jacob Keisling 2024-01-23 08:55:05 -0600
  • c0ade6bba2 Fix typos in launcher help. Nicolas Patry 2024-01-23 15:31:13 +0100
  • 2a7a967de3
    Revert prefill optimization and fix accuracy issue in shift operation (#29) Karol Damaszke 2024-01-23 15:19:07 +0100
  • b1ffd253a9 Updating openapi docs. Nicolas Patry 2024-01-23 15:12:36 +0100
  • 048bc5b4b7 Remove special, it's not correct enough (and not necessarily useful). Nicolas Patry 2024-01-23 15:04:17 +0100
  • c12ff38974 Tokenization route. Nicolas Patry 2024-01-23 14:55:29 +0100
  • 4f7f617e91 Adding tokenizer route. Nicolas Patry 2024-01-23 14:49:04 +0100
  • d805612cc0 Transparently pass through temp and top_p EndlessReform 2024-01-22 22:35:37 -0600
  • 4347960180 Disable decoder_input_details for streaming requests EndlessReform 2024-01-22 22:14:31 -0600
  • 10fce5bffd
    Merge branch 'huggingface:main' into add_sealion_mpt_support David Ong Tat-Wee 2024-01-23 10:52:18 +0800
  • bcf145733c feat: initial weight load drbh 2024-01-23 01:37:09 +0000
  • c49332adb6 fix: remove unused imports and duplicate spaces drbh 2024-01-23 00:18:29 +0000
  • 2b43c5b0dd fix: prefer parallel attn load and small refactors drbh 2024-01-23 00:14:22 +0000
  • 8204f23650 fix: load attn weights to align with flash attn drbh 2024-01-22 21:53:01 +0000
  • 98e5faff9d
    feat: conditionally toggle chat on invocations route (#1454) drbh 2024-01-22 10:29:01 -0500
  • becd09978c
    chore: bump rust version and annotate/fix all clippy warnings (#1455) drbh 2024-01-22 09:22:54 -0500
  • df04b28bfc Add Sealion MPT Support Choon Meng Tan 2024-01-21 12:18:20 +0800
  • 6af0b94046 Enable padding before sharding for tp embedding for non-divisible embedding tables. Pragaash 2024-01-20 15:21:43 -0800
  • b91921f276 fix: lanucher doc typos Andres Restrepo 2024-01-20 13:18:44 -0500
  • fd8b42678d Fix top_n_tokens > 0 gduhamel 2024-01-19 20:50:17 +0100
  • ac3bc0e95e
    Removed kv_cache from HPU graph output (#19) jkaniecki 2024-01-19 15:34:13 +0100
  • da0f874d49
    Prefer prefill instead of decode when max_waiting_tokens==0 (#18) mrs303 2024-01-19 15:25:40 +0100
  • 60f63262db
    Prefill optimization by allocating space only for the first token (#17) Karol Damaszke 2024-01-19 15:18:35 +0100
  • 0b96da89aa
    Make tokenizer optional (#12) Adam Stachowicz 2024-01-19 15:12:04 +0100
  • 5db645a19a fix: remove debug logs drbh 2024-01-19 00:12:58 +0000
  • 43441cad42 fix: improve model initalization drbh 2024-01-18 19:36:50 +0000
  • 215afc15f0 fix: prefer env value from clap for better defaults drbh 2024-01-18 11:03:05 -0500
  • fb29a913a9 chore: bump rust version and annotate/fix all clippy warnings drbh 2024-01-18 10:16:02 -0500
  • 90541fba07 feat: conditionally toggle chat drbh 2024-01-18 09:38:40 -0500
  • 3ccb3bb0b5
    feat: support raise_exception, bos and eos tokens (#1450) drbh 2024-01-18 06:31:56 -0500
  • bc81795370 fix: use generic raise_exception function to improve tests drbh 2024-01-17 18:34:25 -0500
  • f378c60517 fix: make eos and bos optional and only init template once drbh 2024-01-17 18:24:17 -0500
  • 77ee1f18fa feat: load phi weights and produce nonsense tokens drbh 2024-01-17 22:30:57 +0000
  • db835509ed feat: support raise_exception, bos and eos tokens drbh 2024-01-17 09:01:00 -0500
  • 381ec38cad
    Batch bucketing improvements (#15) madamczykhabana 2024-01-17 10:09:27 +0100
  • 8523f7ef64
    Deepspeed terminate (#11) mrs303 2024-01-17 09:57:03 +0100
  • 0eabc83541
    feat: supports openai chat completions API (#1427) drbh 2024-01-16 05:07:41 -0500
  • c459c86f88
    High-level server profiler (#13) Krzysztof Laskowski 2024-01-16 09:57:29 +0100
  • 3dd84f1fc0 #1447 upgrade rust to fix compilation error Lennart 2024-01-16 07:50:19 +0100
  • 4a47f66da1 fix: avoid program exit on repo fetch failures drbh 2024-01-15 18:09:31 -0500
  • 41c4f4fa41
    Debugging utils (#14) madamczykhabana 2024-01-15 21:05:27 +0100
  • fb6c220dc8 feat: support local configs and prefer hf hub drbh 2024-01-15 08:58:11 -0500
  • cd8e0b221e feat: adds phi model drbh 2024-01-13 10:02:24 -0500
  • 3513bc73b2 fix: add removed index from rebase and clippy drbh 2024-01-11 13:52:34 -0500
  • 4555e8721c fix: remove duplicate input_length on Details drbh 2024-01-11 12:10:34 -0500
  • c63551fad7 fix: prefer only intput_length over full ValidRequest in GenerateStreamResponse drbh 2024-01-11 10:46:55 -0500
  • 62e6661616 fix: clippy tweaks drbh 2024-01-10 14:02:28 -0500
  • d009aa3ee3 fix: re-add changes removed during rebase drbh 2024-01-10 14:00:38 -0500
  • 55455a16c7 fix: initialize chat template single time, fix defaults and add seed param drbh 2024-01-10 13:29:20 -0500
  • 47ad7bfbe4 fix: remove trailing space for clippy drbh 2024-01-10 10:26:23 -0500
  • 9fdf47f766 feat: supports openai chat completions API drbh 2024-01-10 10:08:51 -0500
  • ac08b4ef9c
    Return prompt vs generated tokens. (#1436) Nicolas Patry 2024-01-11 19:01:43 +0100
  • ae4cafa7f0 Fix. Nicolas Patry 2024-01-11 18:03:34 +0100
  • 0d48a43df0 Fmt. Nicolas Patry 2024-01-11 17:58:18 +0100
  • a38d701911 Typos. Nicolas Patry 2024-01-11 17:57:31 +0100
  • ca04548080 Make it clear that this value is only partially correct. Nicolas Patry 2024-01-11 17:56:16 +0100
  • 8e0c538a18 Fix by using the actual real value as outputted by the validation workers. Nicolas Patry 2024-01-11 15:18:58 +0000
  • 5c8cc964fa Return prompt vs generated tokens. Nicolas Patry 2024-01-11 14:59:53 +0000
  • a8c5b69e2c
    Set default value of LIMIT_HPU_GRAPH to True (#7) Karol Damaszke 2024-01-11 14:51:49 +0100
  • 532e4b8d41
    Readme updates with review comments (#8) Harish Subramony 2024-01-11 01:12:43 -0800
  • da27fbdfdb
    Fix local load for Medusa (#1420) PYNing 2024-01-11 01:36:20 +0800
  • fbeb1c4475
    fix: follow base model for tokenizer in router (#1424) OlivierDehaene 2024-01-10 16:35:54 +0100
  • 31b23f98ff feat: boilerplate phi2 model integration support-phi-model drbh 2024-01-10 09:42:26 -0500
  • 19aa5308cb Kepp code style consistent with PR #1419 PYNing 2024-01-10 10:58:56 +0800
  • 6545383861 Merge branch 'main' into fix_local_load_for_medusa PYNing 2024-01-10 10:53:00 +0800
  • cb8b7610c0
    Update README for proper usage of LIMIT_HPU_GRAPH (#3) Harish Subramony 2024-01-09 14:49:15 -0800