mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-23 07:52:06 +00:00
* Simplify the `attention` function - Use one definition rather than multiple. - Add `key`/`value` arguments, so that we don't need the `PREFILL_IN_KVCACHE` constant. - Make it kwargs-only (to avoid mixing up the various `Tensor` args). * Fixup flashinfer support |
||
---|---|---|
.. | ||
custom_modeling | ||
__init__.py | ||
bloom.py | ||
causal_lm.py | ||
flash_causal_lm.py | ||
galactica.py | ||
globals.py | ||
idefics_causal_lm.py | ||
mamba.py | ||
mllama_causal_lm.py | ||
model.py | ||
pali_gemma.py | ||
seq2seq_lm.py | ||
types.py | ||
vlm_causal_lm.py |