text-generation-inference/integration-tests
Daniël de Kok 84ab88d843
Support flashinfer for Gemma3 prefill (#3167)
* launcher: ensure correct detection of Gemma 3 head size

* Support flashinfer for Gemma3 prefill

Gemma3 uses bidirectional attention for images. Flashinfer
supports custom masks. Hook up the mask with flashinfer, so that we do
not have to use the slower SDPA implementation for prefills with images.

* Update Gemma3 test outputs

* Fixed unused import
2025-04-17 18:07:41 +02:00
..
fixtures/neuron Avoid running neuron integration tests twice (#3054) 2025-02-26 12:15:01 +01:00
images Pali gemma modeling (#1895) 2024-05-16 06:58:47 +02:00
models Support flashinfer for Gemma3 prefill (#3167) 2025-04-17 18:07:41 +02:00
neuron Update neuron backend (#3098) 2025-03-12 09:53:15 +01:00
conftest.py Pr 3003 ci branch (#3007) 2025-03-10 17:56:19 +01:00
pyproject.toml Bug Fix: Sliding Window Attention (#3112) 2025-03-18 10:37:33 +01:00
pytest.ini chore: add pre-commit (#1569) 2024-02-16 11:58:58 +01:00
requirements.txt Bug Fix: Sliding Window Attention (#3112) 2025-03-18 10:37:33 +01:00
uv.lock Bug Fix: Sliding Window Attention (#3112) 2025-03-18 10:37:33 +01:00