text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-07-10 18:00:16 +00:00

Author	SHA1	Message	Date
Morgan Funtowicz	c9f6c3a8f7	feat(backend): better map exception throw on C++ side	2024-11-29 23:34:16 +01:00
Morgan Funtowicz	db41776a0e	feat(backend): add mimalloc memory allocator to the container	2024-11-29 16:26:00 +01:00
Morgan Funtowicz	f5c4cee364	feat(backend): correctly link to all libraries	2024-11-29 16:25:12 +01:00
Hugo Larcher	59b0ef3018	feat: Fix Cmakelist to allow building on Darwin platform (#2785 ) * feat: Fix Cmakelist to allow building on Darwin platform * fix: Fix tokenizer in llama.cpp Dockerfile	2024-11-29 00:31:36 +01:00
Morgan Funtowicz	b10eaab9f3	feat(backend): use new batch API to generate tokens	2024-11-28 23:57:24 +01:00
Morgan Funtowicz	dc6435e3a5	feat(backend): create llama_context_params with default factory	2024-11-28 23:57:13 +01:00
Morgan Funtowicz	b1ebc8f73b	feat(backend): update llama.cpp to 4215	2024-11-28 23:56:57 +01:00
Morgan Funtowicz	6c5a75b593	misc(offline): update model creation as std::shared_ptr	2024-11-28 17:45:22 +01:00
Morgan Funtowicz	9d659f1e23	feat(backend): add missing temperature parameter	2024-11-28 16:55:17 +01:00
Morgan Funtowicz	df72c56b5b	feat(backend): add guard in case top_k = 0	2024-11-28 16:30:20 +01:00
Morgan Funtowicz	929a2fc718	feat(backend): add some test to the backend for core allocation	2024-11-28 14:53:46 +01:00
Morgan Funtowicz	298367cdfd	feat(backend): fix when num_cores_per_instance is equals to zero with the size of the generated core allocation	2024-11-28 14:53:35 +01:00
Morgan Funtowicz	8e89793514	feat(backend): use the new batch api from llama	2024-11-28 14:52:48 +01:00
Morgan Funtowicz	274cfce435	feat(backend): remove core overriding in the Rust backend	2024-11-28 11:40:52 +01:00
Funtowicz Morgan	d918e6a159	Update Dockerfile.llamacpp as per review Co-authored-by: Hugo Larcher <hugo.larcher@huggingface.co>	2024-11-28 09:53:59 +01:00
Funtowicz Morgan	bbe95ca9e9	Update Dockerfile.llamacpp as per review Co-authored-by: Hugo Larcher <hugo.larcher@huggingface.co>	2024-11-28 09:53:15 +01:00
Morgan Funtowicz	9025a26cea	chore: remove unrelated change to trtllm	2024-11-22 15:42:09 +01:00
Morgan Funtowicz	862a519fdd	misc(doc): rust documentation	2024-11-22 15:35:55 +01:00
Morgan Funtowicz	b9c04b9c07	misc(doc): c++ documentation	2024-11-22 15:13:54 +01:00
Morgan Funtowicz	4ee2ee58c9	misc(license): update LICENSE	2024-11-22 14:48:39 +01:00
Morgan Funtowicz	2d9465d181	misc(backend): allow rebinding numa core affinity	2024-11-22 14:02:58 +01:00
Morgan Funtowicz	30ae99631c	misc(docker): add numa lib as dependency	2024-11-22 13:34:52 +01:00
Morgan Funtowicz	5a85661661	feat(backend): rely on multi consumer queue to scheduler workers	2024-11-22 13:32:56 +01:00
Morgan Funtowicz	84eead219a	feat(backend): correctly setup llama_context providing n_threads and n_ubatch	2024-11-21 21:43:50 +01:00
Morgan Funtowicz	50c376612c	feat(backend): bind thread and memory affinity for thread	2024-11-21 13:52:38 +01:00
Morgan Funtowicz	5335bf973b	feat(backend): multistream inference on CPU	2024-11-21 00:03:05 +01:00
Morgan Funtowicz	23d2bcf28d	misc(build): improve build process	2024-11-14 09:38:13 +01:00
Morgan Funtowicz	70c90ad933	feat(backend): update llamacpp to 4077	2024-11-14 09:04:06 +01:00
Morgan Funtowicz	6f059c4b5d	feat(backend): wrap Arc tokenizer to avoid duplicating	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	57b215467b	feat(backend): simplify Rust callback	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	daf1631e09	dockerfile(backend): initial working version of llama.cpp container	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	02cd6fe427	chore(backend): minor improvements	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	363d5e45de	feat(backend): use std::ranges to map uint32_t to llama_token	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	488ba93898	feat(backend): fix invalid reference to context in release mode	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	7e2890fe2c	feat(backend): remove unused function	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	6915fa3441	feat(backend): remove reinterpret_cast converting from uint32_t to llama_token(int32_t)	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	86d30aea43	feat(backend): simplify overall cpp structure	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	4f5397c414	misc(cmake): use URL base llama.cpp repo	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	cf17928f83	misc(cmake): remove dependency on fmt	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	26d0266cec	feat(backend): handle all the tokenization failure and send back to the client	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	20652824d9	feat(dockerfile): build process	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	a7afde41a9	feat(backend): dockerfile	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	7eec0f704f	chore(backend): minor fixes mostly format	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	a1154b17ec	feat(backend): avoid copy constructor	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	588421833c	misc(backend): missing header <variant>	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	62dba1a878	misc(cmake): use url deps and not git repo	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	52208f5b78	misc(backend): decrease log verbosity in callback	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	1149186794	feat(backend): expose tokenizer to the GenerationContext to decode token	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	1473259f84	feat(backend): add early stopping criteria from TGI stream callback	2024-11-14 08:42:01 +01:00
Morgan Funtowicz	958c72a44a	misc(ffi): remove unused ffi mapping	2024-11-14 08:42:01 +01:00

1 2 3 4 5 ...

1210 Commits