text-generation-inference/integration-tests/models/__snapshots__/test_flash_llama_fp8
Daniël de Kok 0f346a3296
Switch from fbgemm-gpu w8a8 scaled matmul to vLLM/marlin-kernels (#2688)
* Switch from fbgemm-gpu w8a8 scaled matmul to vLLM/marlin-kernels

Performance and accuracy of these kernels are on par (tested with Llama
70B and 405B). Removes a dependency and resolves some stability issues
we have been seeing.

* Update test snapshots
2024-10-25 16:40:47 +02:00
..
test_flash_llama_fp8_all_params.json Switch from fbgemm-gpu w8a8 scaled matmul to vLLM/marlin-kernels (#2688) 2024-10-25 16:40:47 +02:00
test_flash_llama_fp8_load.json Further fixes. (#2426) 2024-08-16 13:21:44 +02:00
test_flash_llama_fp8.json Add FP8 release test (#2261) 2024-07-20 10:26:06 +00:00