pymllm.layers.rms_norm_gated ============================ .. py:module:: pymllm.layers.rms_norm_gated .. autoapi-nested-parse:: Gated RMSNorm layer for Qwen3.5 GDN attention. Computes ``rmsnorm(x, weight, eps) * silu(z)`` using a fused CUDA kernel from mllm-kernel. Falls back to PyTorch when the kernel is unavailable. Attributes ---------- .. autoapisummary:: pymllm.layers.rms_norm_gated.logger Classes ------- .. autoapisummary:: pymllm.layers.rms_norm_gated.RMSNormGated Functions --------- .. autoapisummary:: pymllm.layers.rms_norm_gated.rms_norm_gated Module Contents --------------- .. py:data:: logger .. py:function:: rms_norm_gated(x, weight, z = None, eps = 1e-06, norm_before_gate = True) Compute (optionally gated) RMS normalization. Uses the fused mllm-kernel CUDA implementation when available, otherwise falls back to a pure-PyTorch implementation. .. py:class:: RMSNormGated(hidden_size, eps = 1e-06, group_size = None, norm_before_gate = True, device = None, dtype = None) Bases: :py:obj:`pymllm.layers.base.MllmBaseLayer` Gated RMS Normalization layer for Qwen3.5 GDN attention. Computes:: output = rmsnorm(x, weight) * silu(z) # z is not None output = rmsnorm(x, weight) # z is None Uses a fused CUDA kernel from mllm-kernel for maximum throughput. :param hidden_size: Dimensionality of the input (and weight vector). :type hidden_size: int :param eps: Small constant for numerical stability. :type eps: float :param norm_before_gate: If ``True`` (default): ``rmsnorm(x) * silu(z)``. If ``False``: ``rmsnorm(x * silu(z))``. :type norm_before_gate: bool .. py:attribute:: hidden_size .. py:attribute:: eps :value: 1e-06 .. py:attribute:: norm_before_gate :value: True .. py:attribute:: weight .. py:method:: forward(x, z = None) .. py:method:: extra_repr()