text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-11 16:05:24 +00:00

Author	SHA1	Message	Date
Morgan Funtowicz	958c72a44a	misc(ffi): remove unused ffi mapping	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	5b7a951389	feat(backend): refactor the callback to handle intermediate and end inference message	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	11c593dc69	feat(backend): make eog clearer on c++ side	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	06424aa9ff	feat(backend): correctly handle the max_new_tokens case for is_eos	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	05ff551950	feat(backend): add number of generated tokens in the callback	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	188442f67d	misc(lint): make clippy happier	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	31d9254776	feat(backend): remove static from inner_fw visitor as it leads to invalid memory locations	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	7b0a56f40f	feat(backend): fix memory leaking on llama_sampler when the decode ends	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	86a2ae6ba2	chore: unsued variables	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	2cdfed94d9	feat(backend): correctly link to shared fmt and spdlog instead of static	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	bd8f0f15e1	feat(backend): fix invalid reference to ctx instead of context in release build	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	3e82f14f57	feat(backend): somewhat generates the final infer response	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	b50dcddbb8	feat(backend): avoid dropping the boxed stream at the end of the callback	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	612f2f939f	feat(backend): bind incoming request to the server	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	d4aee42fd8	feat(backend): add logit parameter in the callback fn	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	f39edc72ff	feat(backend): add mapping for ignore_eos_token stopping criteria	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	3af2c6837c	misc(offline): match rework	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	d52b4c4978	feat(backend): full rework of the backend internal to safer c++	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	6a5f6b0755	misc(offline): update offline tester	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	b98c635781	feat(backend): entirely rewrite backend	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	611590440d	misc(offline): expose more parameters for generate	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	dbc5b7a0f7	misc(offline): link correctly	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	0c1dd0ed2b	feat(llamacpp): wip explosion	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	a316c53255	feat(llamacpp): expose number of threads for the backend when constructing the model	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	179309b364	misc(build): refactor build type detection in cmake	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	f0859c247f	misc(build): handle different lib destination folder lib/lib64	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	e4d803c94e	feat(backend): build and link through build.rs	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	355d8a55b4	feat(backend): wip Rust binding	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	f9c248657d	chore(backend): minor formatting	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	37faeb34b2	feat(backend): expose frequency and repetition penalties	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	d4b5be10f9	feat(backend): minor refactor	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	92bb113653	feat(backend): use llama_token as TokenId type	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	45d5a6a8c5	feat(backend): add some initial decoding steps	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	098c66920d	feat(backend): tell cmake to build llama-common and link to it	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	0911076320	feat(backend): correctly load llama.cpp model from llama api and not gpt2	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	05ad684676	feat(llamacpp): enable cuda	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	fa89d1e613	misc(cmake): wut	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	e4432d36b1	misc(cmake): add parameter to build specific cuda arch	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	52d57dca79	feat(llamacpp): initial end2end build	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	7d1f8a2bd6	feat(llamacpp): correctly handle CMAKE_BUILD_TYPE for spdlog macros	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	aa1fcba59f	feat(llamacpp): initial commit # Conflicts: # Cargo.lock	2024-11-14 08:42:01 +01:00
Daniël de Kok	a785000842	Add initial support for compressed-tensors checkpoints (#2732 ) compressed-tensors is a safetensors extension for sparse, quantized tensors. The format is more powerful than earlier AWQ/GPTQ/FP8 quantization, because - Different quantizer configurations can be used for different targets. - The format can specify input/output quantizers in addition to weight quantizers. - Configurable exclusions for quantization. This change adds a dependency on the `compressed-tensors` package for its configuration parsing and layer matching functionality. The following types of quantization are supported in this PR: - W8A16 and W4A16 INT using GPTQ-Marlin kernels. - W8A8 and W8A16 FP using FP8-Marlin and cutlass kernels. Support for other quantization types will be added in subsequent PRs.	2024-11-10 13:54:07 +01:00
Wang, Yi	97f7a22f0b	add trust_remote_code in tokenizer to fix baichuan issue (#2725 ) Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2024-11-07 14:43:38 +01:00
Wang, Yi	b1f9044d6c	fix incorrect output of Qwen2-7B-Instruct-GPTQ-Int4 and Qwen2-7B-Inst… (#2717 ) Some checks failed Secret Leaks / trufflehog (push) Has been cancelled Details Close stale issues and PRs / stale (push) Has been cancelled Details Nightly load test / load-tests (push) Has been cancelled Details fix incorrect output of Qwen2-7B-Instruct-GPTQ-Int4 and Qwen2-7B-Instruct-AWQ ipex kernel provide func like add_bias, so no need add it outside Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2024-11-04 16:07:51 +01:00
Daniël de Kok	5eedb2ec7a	nix: move to tgi-nix `main` (#2718 )	2024-11-04 15:40:13 +01:00
Nicolas Patry	9fde566602	Fixing linting on main. (#2719 )	2024-11-04 15:21:41 +01:00
Travis Addair	aadc9cb485	Fix prefix caching + speculative decoding (#2711 )	2024-11-04 15:08:43 +01:00
Nicolas Patry	a5593ba83e	Hotfixing auto length (warmup max_s was wrong). (#2716 ) Some checks failed Secret Leaks / trufflehog (push) Has been cancelled Details	2024-11-04 09:55:54 +01:00
drbh	08c4184eb2	fix: add chat_tokenize endpoint to api docs (#2710 )	2024-11-04 06:44:59 +01:00
drbh	6e3220529d	fix: create position ids for text only input (#2714 ) * fix: create position ids for text only input * fix: prefer repeat over expand to avoid clone	2024-11-02 08:40:05 +08:00

1 2 3 4 5 ...

1161 Commits