mirror of
https://github.com/huggingface/text-generation-inference.git
synced 2025-04-22 15:32:08 +00:00
* feat: support multidimensional position ids on batch to enable cuda graphs on qwen2-vl * fix: only check model type if config exists * fix: adjust sharding and lm head logic * fix qwen2 failure in intel cpu Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> * fix: return correct shape logits and add streaming test * fix: remove unused import and refactor test --------- Signed-off-by: Wang, Yi A <yi.a.wang@intel.com> |
||
---|---|---|
.. | ||
custom_modeling | ||
__init__.py | ||
bloom.py | ||
causal_lm.py | ||
flash_causal_lm.py | ||
galactica.py | ||
globals.py | ||
idefics_causal_lm.py | ||
mamba.py | ||
metadata_kernels.py | ||
mllama_causal_lm.py | ||
model.py | ||
pali_gemma.py | ||
seq2seq_lm.py | ||
types.py | ||
vlm_causal_lm.py |