text-generation-inference

mirror of https://github.com/huggingface/text-generation-inference.git synced 2025-10-12 00:15:24 +00:00

History

Daniël de Kok 47447ef017 Unify attention output handling (#2343 ) - Always return the hidden states. - Create the output tensor inside the `attention` and `paged_attention` functions. This removes the difference between how the output is handled between attention (output parameter) and paged attention (return value). This also removes the assumption that the attention implementation can write to an output tensor (in preparation of FlashInfer).		2024-08-01 17:03:28 +02:00
..
__init__.py	feat(server): flash santacoder (#153 )	2023-04-03 19:06:42 +02:00
bloom_modeling.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
clip.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
flash_cohere_modeling.py	Unify attention output handling (#2343 )	2024-08-01 17:03:28 +02:00
flash_dbrx_modeling.py	Unify attention output handling (#2343 )	2024-08-01 17:03:28 +02:00
flash_deepseek_v2_modeling.py	Unify attention output handling (#2343 )	2024-08-01 17:03:28 +02:00
flash_gemma2_modeling.py	Unify attention output handling (#2343 )	2024-08-01 17:03:28 +02:00
flash_gemma_modeling.py	Unify attention output handling (#2343 )	2024-08-01 17:03:28 +02:00
flash_gpt2_modeling.py	Unify attention output handling (#2343 )	2024-08-01 17:03:28 +02:00
flash_llama_modeling.py	Unify attention output handling (#2343 )	2024-08-01 17:03:28 +02:00
flash_mistral_modeling.py	Unify attention output handling (#2343 )	2024-08-01 17:03:28 +02:00
flash_mixtral_modeling.py	Unify attention output handling (#2343 )	2024-08-01 17:03:28 +02:00
flash_neox_modeling.py	Unify attention output handling (#2343 )	2024-08-01 17:03:28 +02:00
flash_pali_gemma_modeling.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
flash_phi_modeling.py	Unify attention output handling (#2343 )	2024-08-01 17:03:28 +02:00
flash_qwen2_modeling.py	Unify attention output handling (#2343 )	2024-08-01 17:03:28 +02:00
flash_rw_modeling.py	Unify attention output handling (#2343 )	2024-08-01 17:03:28 +02:00
flash_santacoder_modeling.py	Unify attention output handling (#2343 )	2024-08-01 17:03:28 +02:00
flash_starcoder2_modeling.py	Unify attention output handling (#2343 )	2024-08-01 17:03:28 +02:00
idefics2.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
idefics_config.py	chore: add pre-commit (#1569 )	2024-02-16 11:58:58 +01:00
idefics_image_processing.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
idefics_modeling.py	enable HuggingFaceM4/idefics-9b in intel gpu (#2338 )	2024-08-01 11:08:36 +02:00
idefics_perceiver.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
idefics_processing.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
idefics_vision.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
llava_next.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
mamba_modeling.py	Refactor layers. (#1866 )	2024-05-13 12:44:30 +02:00
mpt_modeling.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
neox_modeling.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
opt_modeling.py	fix dbrx & opt model prefix bug (#2201 )	2024-07-08 09:01:14 +02:00
phi_modeling.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
siglip.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
t5_modeling.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00
vlm.py	feat: add ruff and resolve issue (#2262 )	2024-07-26 10:29:09 -04:00