mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-23 16:02:10 +00:00
The compressed-tensors configuration can specify the configuration of the KV cache as well. Use an FP8 KV cache when the configuration tells us to do so (all other options and types are ignored for now). |
||
---|---|---|
.. | ||
custom_modeling | ||
__init__.py | ||
bloom.py | ||
causal_lm.py | ||
flash_causal_lm.py | ||
galactica.py | ||
globals.py | ||
idefics_causal_lm.py | ||
mamba.py | ||
metadata_kernels.py | ||
mllama_causal_lm.py | ||
model.py | ||
pali_gemma.py | ||
seq2seq_lm.py | ||
types.py | ||
vlm_causal_lm.py |