vllm.model_executor.layers.fused_moe.router.fused_topk_router ¶
FusedTopKRouter ¶
Bases: BaseRouter
Default router using standard fused top-k routing.
Source code in vllm/model_executor/layers/fused_moe/router/fused_topk_router.py
__init__ ¶
__init__(
top_k: int,
global_num_experts: int,
eplb_state: EplbLayerState,
scoring_func: str = "softmax",
renormalize: bool = True,
enable_eplb: bool = False,
indices_type_getter: Callable[[], dtype | None]
| None = None,
)
Source code in vllm/model_executor/layers/fused_moe/router/fused_topk_router.py
_compute_routing ¶
_compute_routing(
hidden_states: Tensor,
router_logits: Tensor,
indices_type: dtype | None,
) -> tuple[Tensor, Tensor]
Compute routing using standard fused top-k.
Source code in vllm/model_executor/layers/fused_moe/router/fused_topk_router.py
dispatch_topk_sigmoid_func ¶
dispatch_topk_softmax_func ¶
fused_topk ¶
fused_topk(
hidden_states: Tensor,
gating_output: Tensor,
topk: int,
renormalize: bool,
indices_type: dtype | None = None,
scoring_func: str = "softmax",
) -> tuple[Tensor, Tensor, Tensor]
Source code in vllm/model_executor/layers/fused_moe/router/fused_topk_router.py
vllm_topk_sigmoid ¶
vllm_topk_sigmoid(
topk_weights: Tensor,
topk_indices: Tensor,
token_expert_indices: Tensor,
gating_output: Tensor,
renormalize: bool = False,
) -> tuple[Tensor, ...]
Source code in vllm/model_executor/layers/fused_moe/router/fused_topk_router.py
vllm_topk_softmax ¶
vllm_topk_softmax(
topk_weights: Tensor,
topk_indices: Tensor,
token_expert_indices: Tensor,
gating_output: Tensor,
renormalize: bool = False,
) -> tuple[Tensor, ...]