mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-22 15:32:08 +00:00
* Prefix caching WIP * Fixing prefix attention. * Fixing flashinfer import. * Fixing black. * Fixing medusa (still wrong outputs, but functional). * Just medusa values now. * Fixing medusa without prefix caching. * Fixing prefix caching. * Medusa requires reshaping. * Removing the logs. * Remove router.nix * Fixup: - Remove logs - Disable VLMs (they do not work) - Disable prefix caching when user wants prefill logprobs. * Update flake.lock --------- Co-authored-by: Daniël de Kok <me@danieldk.eu> |
||
---|---|---|
.. | ||
custom_modeling | ||
__init__.py | ||
bloom.py | ||
causal_lm.py | ||
flash_causal_lm.py | ||
galactica.py | ||
globals.py | ||
idefics_causal_lm.py | ||
idefics.py | ||
mamba.py | ||
model.py | ||
pali_gemma.py | ||
seq2seq_lm.py | ||
types.py | ||
vlm_causal_lm.py |