Commit Graph

  • daf1631e09 dockerfile(backend): initial working version of llama.cpp container Morgan Funtowicz 2024-11-13 00:08:49 +0100
  • 02cd6fe427 chore(backend): minor improvements Morgan Funtowicz 2024-11-13 00:08:26 +0100
  • 363d5e45de feat(backend): use std::ranges to map uint32_t to llama_token Morgan Funtowicz 2024-11-13 00:07:59 +0100
  • 488ba93898 feat(backend): fix invalid reference to context in release mode Morgan Funtowicz 2024-11-11 19:50:33 +0100
  • 7e2890fe2c feat(backend): remove unused function Morgan Funtowicz 2024-11-11 19:50:11 +0100
  • 6915fa3441 feat(backend): remove reinterpret_cast converting from uint32_t to llama_token(int32_t) Morgan Funtowicz 2024-11-09 22:19:38 +0100
  • 86d30aea43 feat(backend): simplify overall cpp structure Morgan Funtowicz 2024-11-09 22:10:33 +0100
  • 4f5397c414 misc(cmake): use URL base llama.cpp repo Morgan Funtowicz 2024-11-08 00:54:05 +0100
  • cf17928f83 misc(cmake): remove dependency on fmt Morgan Funtowicz 2024-11-08 00:53:53 +0100
  • 26d0266cec feat(backend): handle all the tokenization failure and send back to the client Morgan Funtowicz 2024-11-06 17:46:46 +0100
  • 20652824d9 feat(dockerfile): build process Morgan Funtowicz 2024-11-06 17:33:37 +0100
  • a7afde41a9 feat(backend): dockerfile Morgan Funtowicz 2024-11-05 23:48:22 +0100
  • 7eec0f704f chore(backend): minor fixes mostly format Morgan Funtowicz 2024-11-05 23:48:13 +0100
  • a1154b17ec feat(backend): avoid copy constructor Morgan Funtowicz 2024-11-05 23:47:38 +0100
  • 588421833c misc(backend): missing header <variant> Morgan Funtowicz 2024-11-05 23:47:22 +0100
  • 62dba1a878 misc(cmake): use url deps and not git repo Morgan Funtowicz 2024-11-05 23:46:52 +0100
  • 52208f5b78 misc(backend): decrease log verbosity in callback Morgan Funtowicz 2024-11-04 23:24:50 +0100
  • 1149186794 feat(backend): expose tokenizer to the GenerationContext to decode token Morgan Funtowicz 2024-11-04 23:01:57 +0100
  • 1473259f84 feat(backend): add early stopping criteria from TGI stream callback Morgan Funtowicz 2024-11-04 17:01:22 +0100
  • 958c72a44a misc(ffi): remove unused ffi mapping Morgan Funtowicz 2024-11-04 16:26:05 +0100
  • 5b7a951389 feat(backend): refactor the callback to handle intermediate and end inference message Morgan Funtowicz 2024-11-04 16:17:43 +0100
  • 11c593dc69 feat(backend): make eog clearer on c++ side Morgan Funtowicz 2024-11-04 00:11:55 +0100
  • 06424aa9ff feat(backend): correctly handle the max_new_tokens case for is_eos Morgan Funtowicz 2024-11-03 23:50:46 +0100
  • 05ff551950 feat(backend): add number of generated tokens in the callback Morgan Funtowicz 2024-11-03 23:07:22 +0100
  • 188442f67d misc(lint): make clippy happier Morgan Funtowicz 2024-11-03 14:26:57 +0100
  • 31d9254776 feat(backend): remove static from inner_fw visitor as it leads to invalid memory locations Morgan Funtowicz 2024-11-03 11:25:12 +0100
  • 7b0a56f40f feat(backend): fix memory leaking on llama_sampler when the decode ends Morgan Funtowicz 2024-11-03 11:17:02 +0100
  • 86a2ae6ba2 chore: unsued variables Morgan Funtowicz 2024-11-03 00:53:34 +0100
  • 2cdfed94d9 feat(backend): correctly link to shared fmt and spdlog instead of static Morgan Funtowicz 2024-11-03 00:53:17 +0100
  • bd8f0f15e1 feat(backend): fix invalid reference to ctx instead of context in release build Morgan Funtowicz 2024-11-03 00:52:58 +0100
  • 3e82f14f57 feat(backend): somewhat generates the final infer response Morgan Funtowicz 2024-11-03 00:46:04 +0100
  • b50dcddbb8 feat(backend): avoid dropping the boxed stream at the end of the callback Morgan Funtowicz 2024-11-03 00:36:32 +0100
  • 612f2f939f feat(backend): bind incoming request to the server Morgan Funtowicz 2024-11-01 00:50:42 +0100
  • d4aee42fd8 feat(backend): add logit parameter in the callback fn Morgan Funtowicz 2024-11-01 00:49:50 +0100
  • f39edc72ff feat(backend): add mapping for ignore_eos_token stopping criteria Morgan Funtowicz 2024-10-31 21:32:29 +0100
  • 3af2c6837c misc(offline): match rework Morgan Funtowicz 2024-10-31 17:52:18 +0100
  • d52b4c4978 feat(backend): full rework of the backend internal to safer c++ Morgan Funtowicz 2024-10-31 17:51:57 +0100
  • 6a5f6b0755 misc(offline): update offline tester Morgan Funtowicz 2024-10-30 22:40:49 +0100
  • b98c635781 feat(backend): entirely rewrite backend Morgan Funtowicz 2024-10-30 22:40:37 +0100
  • 611590440d misc(offline): expose more parameters for generate Morgan Funtowicz 2024-10-28 22:44:47 +0100
  • dbc5b7a0f7 misc(offline): link correctly Morgan Funtowicz 2024-10-26 22:24:05 +0200
  • 0c1dd0ed2b feat(llamacpp): wip explosion Morgan Funtowicz 2024-10-29 22:30:36 +0100
  • a316c53255 feat(llamacpp): expose number of threads for the backend when constructing the model Morgan Funtowicz 2024-10-25 08:11:42 +0200
  • 179309b364 misc(build): refactor build type detection in cmake Morgan Funtowicz 2024-10-25 08:02:45 +0200
  • f0859c247f misc(build): handle different lib destination folder lib/lib64 Morgan Funtowicz 2024-10-25 07:27:12 +0200
  • e4d803c94e feat(backend): build and link through build.rs Morgan Funtowicz 2024-10-24 16:42:50 +0200
  • 355d8a55b4 feat(backend): wip Rust binding Morgan Funtowicz 2024-10-24 09:56:40 +0200
  • f9c248657d chore(backend): minor formatting Morgan Funtowicz 2024-10-23 22:11:58 +0200
  • 37faeb34b2 feat(backend): expose frequency and repetition penalties Morgan Funtowicz 2024-10-23 14:12:52 +0200
  • d4b5be10f9 feat(backend): minor refactor Morgan Funtowicz 2024-10-23 14:12:32 +0200
  • 92bb113653 feat(backend): use llama_token as TokenId type Morgan Funtowicz 2024-10-23 00:10:41 +0200
  • 45d5a6a8c5 feat(backend): add some initial decoding steps Morgan Funtowicz 2024-10-23 00:09:10 +0200
  • 098c66920d feat(backend): tell cmake to build llama-common and link to it Morgan Funtowicz 2024-10-22 15:23:16 +0200
  • 0911076320 feat(backend): correctly load llama.cpp model from llama api and not gpt2 Morgan Funtowicz 2024-10-22 15:22:56 +0200
  • 05ad684676 feat(llamacpp): enable cuda Morgan Funtowicz 2024-10-21 09:14:51 +0200
  • fa89d1e613 misc(cmake): wut Morgan Funtowicz 2024-10-21 09:14:35 +0200
  • e4432d36b1 misc(cmake): add parameter to build specific cuda arch Morgan Funtowicz 2024-10-18 17:10:22 +0200
  • 52d57dca79 feat(llamacpp): initial end2end build Morgan Funtowicz 2024-10-04 10:42:31 +0200
  • 7d1f8a2bd6 feat(llamacpp): correctly handle CMAKE_BUILD_TYPE for spdlog macros Morgan Funtowicz 2024-10-03 15:25:15 +0200
  • aa1fcba59f feat(llamacpp): initial commit Morgan Funtowicz 2024-10-03 14:00:17 +0200
  • 9d0aaf6c9d fix response type of document for Text Generation Inference jitokim 2024-11-14 01:13:03 +0900
  • 3f79225472 Check if allowed tokens is None Alex Weston 2024-10-25 11:21:17 -0400
  • 803d697d3d Update for new API Nicolas Patry 2024-10-25 10:46:05 +0200
  • 1a22985fcd Upgrade outlines to 0.1.1 Alex Weston 2024-10-16 13:58:54 -0400
  • d6c8426b91 benchmark: fix prefill throughput Daniël de Kok 2024-11-12 09:17:37 +0000
  • 0078c40e66 Fix: Change model_type from ssm to mamba Ubuntu 2024-11-10 22:44:13 +0000
  • 9fc8adbc81 fix: change embeddings to embedding Ubuntu 2024-11-10 21:39:49 +0000
  • a785000842
    Add initial support for compressed-tensors checkpoints (#2732) Daniël de Kok 2024-11-10 13:54:07 +0100
  • 575af7e672 Add initial support for compressed-tensors checkpoints Daniël de Kok 2024-11-08 12:23:45 +0000
  • 1639152ca4 fix oneapi basekit version Wang, Yi A 2024-11-07 22:09:42 -0800
  • 56c3eb4adb
    Remove the torch package in requirements.txt (#246) yuanwu2017 2024-11-08 01:22:24 +0800
  • 97f7a22f0b
    add trust_remote_code in tokenizer to fix baichuan issue (#2725) Wang, Yi 2024-11-07 21:43:38 +0800
  • 6297f1769f
    feat: add payload limit OlivierDehaene 2024-11-05 16:35:33 +0100
  • 32989e9b39 add trust_remote_code in tokenizer to fix baichuan issue Wang, Yi A 2024-11-04 20:49:35 -0800
  • 6becab5d3f torch has xpu support in 2.5 Wang, Yi A 2024-11-04 18:06:23 -0800
  • 3d4c50f028 Merge branch 'main' into moe Wang, Yi A 2024-11-04 17:53:13 -0800
  • 41dff3147d feat: support flash attention 2 in qwen2 vl vision blocks David Holtz 2024-11-04 16:12:56 +0000
  • b1f9044d6c
    fix incorrect output of Qwen2-7B-Instruct-GPTQ-Int4 and Qwen2-7B-Inst… (#2717) Wang, Yi 2024-11-04 23:07:51 +0800
  • 5eedb2ec7a
    nix: move to tgi-nix main (#2718) Daniël de Kok 2024-11-04 15:40:13 +0100
  • 9fde566602
    Fixing linting on main. (#2719) Nicolas Patry 2024-11-04 22:21:41 +0800
  • b81231c790
    Fixing linting on main. Nicolas Patry 2024-11-04 15:10:26 +0100
  • aadc9cb485
    Fix prefix caching + speculative decoding (#2711) Travis Addair 2024-11-04 06:08:43 -0800
  • 8c2659ed23 nix: move to tgi-nix main Daniël de Kok 2024-11-04 12:52:18 +0000
  • 69569f0c2a fix incorrect output of Qwen2-7B-Instruct-GPTQ-Int4 and Qwen2-7B-Instruct-AWQ ipex kernel provide func like add_bias, so no need add it outside Wang, Yi A 2024-11-04 03:41:31 -0800
  • a5593ba83e
    Hotfixing auto length (warmup max_s was wrong). (#2716) Nicolas Patry 2024-11-04 16:55:54 +0800
  • 1c1b2486d9
    Hotfixing auto length (warmup max_s was wrong). Nicolas Patry 2024-11-04 09:41:09 +0100
  • 1108051605 update to ipex xpu 2.5 Wang, Yi A 2024-11-03 21:49:53 -0800
  • 08c4184eb2
    fix: add chat_tokenize endpoint to api docs (#2710) drbh 2024-11-04 00:44:59 -0500
  • d2f286875e Merge branch 'main' into moe Wang, Yi A 2024-11-03 21:42:48 -0800
  • 6e3220529d
    fix: create position ids for text only input (#2714) drbh 2024-11-01 20:40:05 -0400
  • 3c8d1f4b2f fix: prefer repeat over expand to avoid clone David Holtz 2024-11-01 21:11:15 +0000
  • a604bfe450 fix: run pre commit lints pr-2711-ci-branch drbh 2024-11-01 12:11:57 -0400
  • 11b3070ee7 fix: create position ids for text only input David Holtz 2024-11-01 15:54:52 +0000
  • c345c734a7
    Merge branch 'habana-main' into 2.3.0 yuanwu2017 2024-11-01 11:24:40 +0800
  • fcf2e3a338 Fix the prefill warmup issue yuanwu 2024-11-01 05:08:18 +0200
  • 01dacf8e8f
    fix cuda graphs for qwen2-vl (#2708) drbh 2024-10-31 22:05:34 -0400
  • d375e1e259 fix: remove unused import and refactor test drbh 2024-10-31 19:25:35 -0400
  • e2b394e3a0 fix: return correct shape logits and add streaming test David Holtz 2024-10-31 23:14:03 +0000
  • 6ba3d1d6e5
    updated release docker image version in readme to 2.0.6 (#242) Thanaji Rao Thakkalapelli 2024-10-31 15:44:16 -0700
  • 17de5998e5 fix qwen2 failure in intel cpu David Holtz 2024-10-31 22:39:07 +0000