text-generation-inference/server/text_generation_server/layers/marlin
Daniël de Kok 1dd346666a
Clarify FP8-Marlin use on capability 8.9 (#2940)
The log message stated that the GPU does not support FP8 on capability
8.9. However we use FP8-Marlin on that capability because it is faster.
2025-01-22 18:18:11 +01:00
..
__init__.py Handle GPTQ-Marlin loading in GPTQMarlinWeightLoader (#2300) 2024-07-31 13:08:41 +02:00
fp8.py Clarify FP8-Marlin use on capability 8.9 (#2940) 2025-01-22 18:18:11 +01:00
gptq.py Add initial support for compressed-tensors checkpoints (#2732) 2024-11-10 13:54:07 +01:00
marlin.py Add support for wNa16 int 2:4 compressed-tensors checkpoints (#2758) 2024-11-20 18:25:23 +01:00
util.py Split up layers.marlin into several files (#2292) 2024-07-24 16:33:26 +02:00