text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-05-25 08:12:09 +00:00

Author	SHA1	Message	Date
Nicolas Patry	7a5855ff01	NCCL ?	2024-09-17 09:37:05 +02:00
Nicolas Patry	fb7e8c8970	Add the cache.	2024-09-17 09:20:12 +02:00
Guillaume LEGENDRE	2aa2851e01	use runners with cache	2024-09-17 08:12:19 +02:00
Nicolas Patry	87c85fdc38	Standard setup.	2024-09-16 17:04:11 +02:00
Nicolas Patry	69c20a9d3f	Tmate let's find with ldconfig ?	2024-09-16 17:03:28 +02:00
Nicolas Patry	c784cb401d	Let's try a compat drvier ?	2024-09-16 17:03:28 +02:00
Nicolas Patry	fe533dc57b	Back to failing version	2024-09-16 17:03:28 +02:00
Nicolas Patry	2f1f082abe	Tmate.	2024-09-16 17:03:28 +02:00
Nicolas Patry	1a6b9926f6	missing lib.	2024-09-16 17:03:27 +02:00
Nicolas Patry	332e42f59a	Attempt.	2024-09-16 17:03:27 +02:00
Nicolas Patry	ec6fe324c6	Link to nix owned lib	2024-09-16 17:03:27 +02:00
Nicolas Patry	83ee55a617	Trye somethign.	2024-09-16 17:03:27 +02:00
Nicolas Patry	047530216c	No idea where the shared disk is.	2024-09-16 17:03:27 +02:00
Nicolas Patry	9f548fa82a	Change the home location ?	2024-09-16 17:03:27 +02:00
Nicolas Patry	3ff12084b7	Revert "No tmate." This reverts commit 6b9b6d951897127ae1ce09c8f61f86a64b301fec.	2024-09-16 17:03:26 +02:00
Nicolas Patry	26634f9697	No tmate.	2024-09-16 17:03:26 +02:00
Nicolas Patry	a533d086f0	Tmate to find cache.	2024-09-16 17:03:26 +02:00
Nicolas Patry	a5b81ab457	Home.	2024-09-16 17:03:26 +02:00
Nicolas Patry	98f2241a88	Put back libnvidia-ml	2024-09-16 17:03:26 +02:00
Nicolas Patry	72a805d50d	Remove tmate.	2024-09-16 17:03:26 +02:00
Nicolas Patry	45c0129976	Attempting something.	2024-09-16 17:03:25 +02:00
Nicolas Patry	2b18537f85	More tmate.	2024-09-16 17:03:25 +02:00
Nicolas Patry	12b88204b0	Putting the cuda package in the flake.	2024-09-16 17:03:25 +02:00
Nicolas Patry	d7333830b5	Tmate.	2024-09-16 17:03:25 +02:00
Nicolas Patry	c4bbe06bf1	Simpler command	2024-09-16 17:02:45 +02:00
Nicolas Patry	d0ae24a167	Release tests.	2024-09-16 17:02:25 +02:00
Nicolas Patry	5c4b2eaa30	Seeing the damage on the release tests.	2024-09-16 17:01:51 +02:00
Nicolas Patry	70f910bba6	Remove tmate.	2024-09-16 17:01:51 +02:00
Nicolas Patry	5adece6313	This doesn't seem needed.	2024-09-16 17:01:51 +02:00
Nicolas Patry	b7cb8d5145	Let's figure out the issue...	2024-09-16 17:01:30 +02:00
Nicolas Patry	3d7b81535a	Only link cuda driver librairies.	2024-09-16 17:01:30 +02:00
Nicolas Patry	ce3efc83ed	Remove tmate.	2024-09-16 17:01:30 +02:00
Nicolas Patry	7f58f7dc61	Symlink all the things.	2024-09-16 17:01:29 +02:00
Nicolas Patry	42107de71f	Let's try to find libnvidia-ml	2024-09-16 17:01:29 +02:00
Nicolas Patry	edaa7f847d	Does this work ?	2024-09-16 17:01:29 +02:00
Nicolas Patry	d1e79ddae0	Fix override.	2024-09-16 17:01:29 +02:00
Nicolas Patry	db054b95df	Check the paths.	2024-09-16 17:01:29 +02:00
Nicolas Patry	afcd047a58	Yaml yaml.	2024-09-16 17:01:29 +02:00
Nicolas Patry	60db294f9a	Link cuda to nix ?	2024-09-16 17:01:28 +02:00
Nicolas Patry	8e7c7c61f1	Let's see what the issue is ?	2024-09-16 17:01:28 +02:00
Nicolas Patry	c227345878	Run on actual GPUs.	2024-09-16 17:01:28 +02:00
Nicolas Patry	f47cdc1fe1	Attempting rapidly the integration tests.	2024-09-16 17:01:26 +02:00
Nicolas Patry	d95c670ada	Add nix test. (#2513 ) * Add nix test. * Modifying yourself means you need to rerun. * Fixing the test + adding click (needed for pre-commit hooks). * Try thuis. * Our runner + pure test (not written) * Reemove server. * Root user. * Different user ? * Add the actual test target. * Forgot this modification. * Add a formatter. * Add the secrets. * Fixed the auth token ? * Adding the other tests. * Missing pre-commit. * Test requires cargo for cargo fmt. * Update it a bit. * Up. * Attempting to use a cache location for the models. * Ignore the cache for now.	2024-09-12 14:54:56 +02:00
Nicolas Patry	dae3bf1d87	Fix tokenization yi (#2507 ) * Fixing odd tokenization self modifications on the Rust side (load and resave in Python). * Fixing the builds ? * Fix the gh action? * Fixing the location ? * Validation is odd. * Try a faster runner * Upgrade python version. * Remove sccache * No sccache. * Getting libpython maybe ? * List stuff. * Monkey it up. * have no idea at this point * Tmp. * Shot in the dark. * Tmate the hell out of this. * Desperation. * WTF. * -y. * Apparently 3.10 is not available anymore. * Updating the dockerfile to make libpython discoverable at runtime too. * Put back rust tests. * Why do we want mkl on AMD ? * Forcing 3.11 ?	2024-09-11 22:41:56 +02:00
Nicolas Patry	e415b690a6	Lots of improvements (Still 2 allocators) (#2449 ) * Making prefix/flashinfer the default and testing the full release tests. * Include flashinfer in the docker. * Using prebuilt. * Allowing window_left_size (dummy version). * Disabling flashinfer/prefix caching on odd head_dim * Disable prefix caching for lora. * More specific codes. * Update lock * Updating integration tests with new values with FI/FD. Remove paged as a default too, and using FD everywhere. * Update cargo lock ? * Upgrade to 1.80 because of bitstream... * Everywhere 1.80 * Forgot last default place. * Apply suggestions from code review Co-authored-by: drbh <david.richard.holtz@gmail.com> * Updated flake lock * Tmp * Upgrade resolution system for less errors in resolution. * Remove lambda for cleaner function. * Handling debugger. * OVerride the env in server tests. * Is this enough to make it work ? * This seems to be working. * Downgrade some logs. * Fixing the default for vlm. * Don't enable prefix caching on VLM just yet. * Change `add_special_tokens` in order to have the correct tokens for chat input and not (since it's super important with the prefixing now) * Fixing prefix caching for flashdecoding. * Update all models. * Fixed flashinfer version. * add_special_tokens is internal only * Fixing seqlen with the new vlms. * Fixing the issue with `add_special_tokens` not being passed around. * Fixing the test. * Removing encoder_decoder (seq2seq). * Update the chat test. * Fixing the batching tokenization in flash causal lm. * Truncating left for radix purposes. * Oops this doesn't belong here. * Put back default pure shell. * Update server tests - Default to throughput test in k6 - Use TGI_WIGGLE_ROOM to adjust wiggle room * Only n_heads / process_group.size() are necessary. * Revert the integrationt tests change (seem linked to head_size modification). * Adding error message when assert is violated. * Fixing the free algorithm to handle times where the common prefix is smaller. * Apply suggestions from code review Co-authored-by: OlivierDehaene <olivier@huggingface.co> * Update server/text_generation_server/layers/attention/common.py Co-authored-by: OlivierDehaene <olivier@huggingface.co> * Fix disabling prefix caching - Fix windowing checks. * Revert the Cohere tokenizer change (for now using a revision instead). * Fmt. --------- Co-authored-by: drbh <david.richard.holtz@gmail.com> Co-authored-by: OlivierDehaene <olivier@huggingface.co>	2024-08-29 16:29:01 +02:00
Nicolas Patry	2788d41a76	Fixing CI. (#2462 )	2024-08-27 15:33:02 +02:00
Nicolas Patry	e4201f44cf	All integration tests back everywhere (too many failed CI). (#2428 ) * All integration tests back everywhere (too many failed CI). * Upgrade integration tests after 12.4 * Attempt to remove the specifed compute cap. * Common arch list. * Punica uses raw ASM which is not valid on 9.0 apparently.	2024-08-16 21:19:46 +02:00
Hugo Larcher	53729b74ac	doc: Add metrics documentation and add a 'Reference' section (#2230 ) * doc: Add metrics documentation and add a 'Reference' section * doc: Add API reference * doc: Refactor API reference * fix: Message API link * Bad rebase * Moving the docs. --------- Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>	2024-08-16 19:43:30 +02:00
Wang, Yi	b6bb1d5160	Cpu dockerimage (#2367 ) add intel-cpu docker image Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>	2024-08-12 14:10:30 +02:00
Daniël de Kok	22fb1be588	Fix cache block size for flash decoding (#2351 ) * Fix cache block size for flash decoding This seems to have been accidentally dropped during the TRT-LLM PR rebase. * Also run CI on changes to `backends`	2024-08-01 15:38:57 +02:00

1 2 3

142 Commits