vllm.model_executor.layers.quantization.kernels.scaled_mm.pytorch ¶
ChannelWiseTorchFP8ScaledMMLinearKernel ¶
Bases: TorchFP8ScaledMMLinearKernel
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/pytorch.py
apply_scaled_mm ¶
apply_scaled_mm(
*,
A: Tensor,
B: Tensor,
out_dtype: dtype,
As: Tensor,
Bs: Tensor,
bias: Tensor | None,
output_shape: list,
) -> Tensor
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/pytorch.py
can_implement classmethod ¶
can_implement(
c: FP8ScaledMMLinearLayerConfig,
) -> tuple[bool, str | None]
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/pytorch.py
PerTensorTorchFP8ScaledMMLinearKernel ¶
Bases: TorchFP8ScaledMMLinearKernel
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/pytorch.py
apply_scaled_mm ¶
apply_scaled_mm(
*,
A: Tensor,
B: Tensor,
out_dtype: dtype,
As: Tensor,
Bs: Tensor,
bias: Tensor | None,
output_shape: list,
) -> Tensor
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/pytorch.py
can_implement classmethod ¶
can_implement(
c: FP8ScaledMMLinearLayerConfig,
) -> tuple[bool, str | None]
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/pytorch.py
RowWiseTorchFP8ScaledMMLinearKernel ¶
Bases: TorchFP8ScaledMMLinearKernel
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/pytorch.py
apply_scaled_mm ¶
apply_scaled_mm(
*,
A: Tensor,
B: Tensor,
out_dtype: dtype,
As: Tensor,
Bs: Tensor,
bias: Tensor | None,
output_shape: list,
) -> Tensor
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/pytorch.py
can_implement classmethod ¶
can_implement(
c: FP8ScaledMMLinearLayerConfig,
) -> tuple[bool, str | None]
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/pytorch.py
is_supported classmethod ¶
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/pytorch.py
TorchFP8ScaledMMLinearKernel ¶
Bases: FP8ScaledMMLinearKernel
Base class for FP8 linear kernels using Torch. Each subclass represents a kernel variant for specific device capabilities and torch versions.
Source code in vllm/model_executor/layers/quantization/kernels/scaled_mm/pytorch.py
get_output_padding ¶
get_output_padding() -> int | None