Nicolas Patry
|
635dde8af9
|
Prefix caching (#2402)
* Prefix caching WIP
* Fixing prefix attention.
* Fixing flashinfer import.
* Fixing black.
* Fixing medusa (still wrong outputs, but functional).
* Just medusa values now.
* Fixing medusa without prefix caching.
* Fixing prefix caching.
* Medusa requires reshaping.
* Removing the logs.
* Remove router.nix
* Fixup:
- Remove logs
- Disable VLMs (they do not work)
- Disable prefix caching when user wants prefill logprobs.
* Update flake.lock
---------
Co-authored-by: Daniël de Kok <me@danieldk.eu>
|
2024-09-25 06:10:59 +00:00 |
|