mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-26 04:22:08 +00:00
* Simplify the `attention` function - Use one definition rather than multiple. - Add `key`/`value` arguments, so that we don't need the `PREFILL_IN_KVCACHE` constant. - Make it kwargs-only (to avoid mixing up the various `Tensor` args). * Fixup flashinfer support |
||
---|---|---|
.. | ||
__init__.py | ||
common.py | ||
cuda.py | ||
flash_attn_triton.py | ||
flashinfer.py | ||
ipex.py | ||
kv_cache.py | ||
rocm.py |