text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-04-25 02:52:12 +00:00

History

Daniël de Kok 5b6b74e21d Improve support for GPUs with capability < 8 (#2575 ) * Improve support for GPUs with capability < 8 - For models that cannot use flashinfer, use flash-attn v1 + paged attention for models with a compute capability older than 8. - Disable prefix caching when using paged attention. - When using flash-attn v1, pass the key/value, rather than the cache, since v1 cannot use block tables. * nix: add flash-attn-v1 to the server environment * Move disabling prefix caching into the block of exceptions * Capability as `usize`s		2024-09-27 16:19:42 +02:00
..
custom_modeling	Improve support for GPUs with capability < 8 (#2575 )	2024-09-27 16:19:42 +02:00
__init__.py	Add support for scalar FP8 weight scales (#2550 )	2024-09-24 13:57:40 +02:00
bloom.py	Refactor dead code - Removing all `flash_xxx.py` files. (#2166 )	2024-07-05 10:29:56 +02:00
causal_lm.py	Fixing exl2 and other quanize tests again. (#2419 )	2024-08-15 11:12:51 +02:00
flash_causal_lm.py	hotfix : enable intel ipex cpu and xpu in python3.11 (#2517 )	2024-09-12 17:23:49 +02:00
galactica.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
globals.py	Lots of improvements (Still 2 allocators) (#2449 )	2024-08-29 16:29:01 +02:00
idefics_causal_lm.py	Upgrading exl2. (#2415 )	2024-08-14 11:58:08 +02:00
idefics.py	Upgrading exl2. (#2415 )	2024-08-14 11:58:08 +02:00
mamba.py	Fixing exl2 and other quanize tests again. (#2419 )	2024-08-15 11:12:51 +02:00
model.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
pali_gemma.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
seq2seq_lm.py	Fixing exl2 and other quanize tests again. (#2419 )	2024-08-15 11:12:51 +02:00
types.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
vlm_causal_lm.py	Prefix test - Different kind of load test to trigger prefix test bugs. (#2490 )	2024-09-11 18:10:40 +02:00