text-generation-inference/server/text_generation_server/layers
Daniël de Kok 1dd346666a
Clarify FP8-Marlin use on capability 8.9 ()
The log message stated that the GPU does not support FP8 on capability
8.9. However we use FP8-Marlin on that capability because it is faster.
2025-01-22 18:18:11 +01:00
..
attention flashinfer: switch to plan API () 2025-01-17 18:18:02 +01:00
awq fix incorrect output of Qwen2-7B-Instruct-GPTQ-Int4 and Qwen2-7B-Inst… () 2024-11-04 16:07:51 +01:00
compressed_tensors Do not convert weight scale to e4m3fnuz on CUDA () 2025-01-16 13:44:32 +01:00
gptq Flash Transformers modeling backend support () 2025-01-21 10:01:51 +01:00
marlin Clarify FP8-Marlin use on capability 8.9 () 2025-01-22 18:18:11 +01:00
moe fix moe in quantization path () 2025-01-22 14:36:15 +01:00
__init__.py feat: add ruff and resolve issue () 2024-07-26 10:29:09 -04:00
bnb.py feat: add ruff and resolve issue () 2024-07-26 10:29:09 -04:00
conv.py Refactor layers. () 2024-05-13 12:44:30 +02:00
eetq.py feat(fp8): use fbgemm kernels and load fp8 weights directly () 2024-07-20 19:02:04 +02:00
exl2.py Add support for Deepseek V2 () 2024-07-19 17:23:20 +02:00
fp8.py Clarify FP8-Marlin use on capability 8.9 () 2025-01-22 18:18:11 +01:00
layernorm.py Update vllm kernels for ROCM () 2024-12-18 12:44:42 +01:00
linear.py Update vllm kernels for ROCM () 2024-12-18 12:44:42 +01:00
lora.py feat: add ruff and resolve issue () 2024-07-26 10:29:09 -04:00
medusa.py Prefix caching () 2024-08-20 11:15:30 +02:00
mlp.py Tied embeddings in MLP speculator. () 2024-08-29 17:44:54 +02:00
rotary.py Update vllm kernels for ROCM () 2024-12-18 12:44:42 +01:00
speculative.py feat: add ruff and resolve issue () 2024-07-26 10:29:09 -04:00
tensor_parallel.py feat: add ruff and resolve issue () 2024-07-26 10:29:09 -04:00