text-generation-inference/server/text_generation_server/models
Daniël de Kok 47447ef017
Unify attention output handling (#2343)
- Always return the hidden states.
- Create the output tensor inside the `attention` and `paged_attention`
  functions.

This removes the difference between how the output is handled between
attention (output parameter) and paged attention (return value). This
also removes the assumption that the attention implementation can
write to an output tensor (in preparation of FlashInfer).
2024-08-01 17:03:28 +02:00
..
custom_modeling Unify attention output handling (#2343) 2024-08-01 17:03:28 +02:00
__init__.py feat: add ruff and resolve issue (#2262) 2024-07-26 10:29:09 -04:00
bloom.py Refactor dead code - Removing all flash_xxx.py files. (#2166) 2024-07-05 10:29:56 +02:00
causal_lm.py feat: add ruff and resolve issue (#2262) 2024-07-26 10:29:09 -04:00
flash_causal_lm.py Pr 2290 ci run (#2329) 2024-07-31 10:27:15 -04:00
galactica.py feat: add ruff and resolve issue (#2262) 2024-07-26 10:29:09 -04:00
globals.py Pr 2290 ci run (#2329) 2024-07-31 10:27:15 -04:00
idefics_causal_lm.py feat: add ruff and resolve issue (#2262) 2024-07-26 10:29:09 -04:00
idefics.py enable HuggingFaceM4/idefics-9b in intel gpu (#2338) 2024-08-01 11:08:36 +02:00
mamba.py feat: add ruff and resolve issue (#2262) 2024-07-26 10:29:09 -04:00
model.py feat: add ruff and resolve issue (#2262) 2024-07-26 10:29:09 -04:00
pali_gemma.py feat: add ruff and resolve issue (#2262) 2024-07-26 10:29:09 -04:00
seq2seq_lm.py feat: add ruff and resolve issue (#2262) 2024-07-26 10:29:09 -04:00
types.py feat: add ruff and resolve issue (#2262) 2024-07-26 10:29:09 -04:00
vlm_causal_lm.py fix crash in multi-modal (#2245) 2024-07-24 10:39:08 +02:00