pymllm.layers.rope ================== .. py:module:: pymllm.layers.rope Functions --------- .. autoapisummary:: pymllm.layers.rope.apply_rope pymllm.layers.rope.apply_llama31_rope pymllm.layers.rope.apply_rope_pos_ids pymllm.layers.rope.apply_llama31_rope_pos_ids pymllm.layers.rope.apply_rope_with_cos_sin_cache pymllm.layers.rope.apply_mrope Module Contents --------------- .. py:function:: apply_rope(q, k, indptr, offsets, inplace = False, rotary_dim = None, interleave = False, rope_scale = 1.0, rope_theta = 10000.0) Apply rotary embedding to a batch of queries/keys (stored as RaggedTensor). cos/sin values are computed on the fly inside the kernel. Position offsets are provided per-segment via ``indptr`` and ``offsets``. :param q: Query ragged tensor, shape ``(nnz, num_q_heads, head_dim)``. :param k: Key ragged tensor, shape ``(nnz, num_k_heads, head_dim)``. :param indptr: Indptr tensor, shape ``(batch_size + 1,)``. The i-th segment spans ``q[indptr[i]:indptr[i+1]]``. :param offsets: Relative position offsets per segment, shape ``(batch_size,)``. :param inplace: If ``True``, apply RoPE in-place and return ``None``. If ``False``, return new ``(q_rope, k_rope)`` tensors. :param rotary_dim: Number of dimensions to apply RoPE to. ``None`` means the entire ``head_dim``. :param interleave: If ``True``, rotate even/odd dims (``[..., ::2]`` / ``[..., 1::2]``). If ``False``, rotate first/second half dims. :param rope_scale: Scaling factor for position indices. :param rope_theta: Base frequency theta. :returns: ``None`` when *inplace* is ``True``, otherwise a tuple ``(q_rope, k_rope)`` of rotated tensors with the same shapes as the inputs. .. py:function:: apply_llama31_rope(q, k, indptr, offsets, inplace = False, rotary_dim = None, interleave = False, rope_scale = 8.0, rope_theta = 500000.0, low_freq_factor = 1.0, high_freq_factor = 4.0, old_context_len = 8192) Apply Llama 3.1 style rotary embedding to a batch of queries/keys. This variant adjusts frequencies with ``low_freq_factor``, ``high_freq_factor``, and ``old_context_len`` following the Llama 3.1 RoPE recipe. cos/sin values are computed on the fly. :param q: Query ragged tensor, shape ``(nnz, num_q_heads, head_dim)``. :param k: Key ragged tensor, shape ``(nnz, num_k_heads, head_dim)``. :param indptr: Indptr tensor, shape ``(batch_size + 1,)``. :param offsets: Relative position offsets per segment, shape ``(batch_size,)``. :param inplace: If ``True``, apply in-place and return ``None``. :param rotary_dim: Number of dimensions to apply RoPE to. ``None`` means the entire ``head_dim``. :param interleave: If ``True``, rotate even/odd dims; otherwise first/second half dims. :param rope_scale: Scaling factor for position indices (default ``8``). :param rope_theta: Base frequency theta (default ``5e5``). :param low_freq_factor: Low frequency factor for Llama 3.1 RoPE. :param high_freq_factor: High frequency factor for Llama 3.1 RoPE. :param old_context_len: Original context length for Llama 3.1 RoPE. :returns: ``None`` when *inplace* is ``True``, otherwise ``(q_rope, k_rope)``. .. py:function:: apply_rope_pos_ids(q, k, pos_ids, inplace = False, rotary_dim = None, interleave = False, rope_scale = 1.0, rope_theta = 10000.0) Apply rotary embedding using explicit per-token position IDs. Unlike :func:`apply_rope` which derives positions from ``indptr`` / ``offsets``, this function takes a flat ``pos_ids`` tensor that supplies an explicit position for every token. :param q: Query tensor, shape ``(nnz, num_q_heads, head_dim)``. :param k: Key tensor, shape ``(nnz, num_k_heads, head_dim)``. :param pos_ids: Position indices, shape ``(nnz,)``. :param inplace: If ``True``, apply in-place and return ``None``. :param rotary_dim: Number of dimensions to apply RoPE to. :param interleave: Interleaved layout flag. :param rope_scale: Scaling factor for position indices. :param rope_theta: Base frequency theta. :returns: ``None`` when *inplace* is ``True``, otherwise ``(q_rope, k_rope)``. .. py:function:: apply_llama31_rope_pos_ids(q, k, pos_ids, inplace = False, rotary_dim = None, interleave = False, rope_scale = 8.0, rope_theta = 500000.0, low_freq_factor = 1.0, high_freq_factor = 4.0, old_context_len = 8192) Apply Llama 3.1 style RoPE using explicit per-token position IDs. Combines Llama 3.1 frequency adjustments with explicit ``pos_ids``. :param q: Query tensor, shape ``(nnz, num_q_heads, head_dim)``. :param k: Key tensor, shape ``(nnz, num_k_heads, head_dim)``. :param pos_ids: Position indices, shape ``(nnz,)``. :param inplace: If ``True``, apply in-place and return ``None``. :param rotary_dim: Number of dimensions to apply RoPE to. :param interleave: Interleaved layout flag. :param rope_scale: Scaling factor (default ``8``). :param rope_theta: Base frequency theta (default ``5e5``). :param low_freq_factor: Low frequency factor for Llama 3.1 RoPE. :param high_freq_factor: High frequency factor for Llama 3.1 RoPE. :param old_context_len: Original context length for Llama 3.1 RoPE. :returns: ``None`` when *inplace* is ``True``, otherwise ``(q_rope, k_rope)``. .. py:function:: apply_rope_with_cos_sin_cache(positions, query, key, head_size, cos_sin_cache, inplace = False, is_neox = True) Apply rotary embedding with precomputed cos/sin cache. Compatible with SGL/vLLM implementations. Note that ``query`` and ``key`` use a **flattened** head layout ``(nnz, num_heads * head_size)`` instead of the 3-D layout used by the other ``apply_rope*`` functions. :param positions: Position indices, shape ``(nnz,)``. :param query: Query tensor, shape ``(nnz, num_q_heads * head_size)``. :param key: Key tensor, shape ``(nnz, num_k_heads * head_size)``. :param head_size: Size of each attention head. :param cos_sin_cache: Precomputed cos/sin tensor, shape ``(max_seq_len, rotary_dim)``. The first half of ``rotary_dim`` stores cosine values, the second half stores sine values. :param inplace: If ``True``, apply in-place and return ``None``. :param is_neox: If ``True`` (default), use GPT-NeoX style (rotate first/second half dims). If ``False``, use interleaved style (rotate even/odd dims). :returns: ``None`` when *inplace* is ``True``, otherwise ``(query_out, key_out)`` with the same shapes as the inputs. .. py:function:: apply_mrope(q, k, positions, cos_sin_cache, mrope_section, mrope_interleaved = True) Apply multi-dimensional rotary position embedding (M-RoPE). Used by Qwen3-VL which assigns independent (t, h, w) position indices to each token. For text tokens all three indices are the same sequential value; for image tokens they follow the spatial grid layout. :param q: Query tensor, shape ``(T, num_q_heads, head_dim)``. :param k: Key tensor, shape ``(T, num_kv_heads, head_dim)``. :param positions: 3-D position IDs, shape ``(3, T)`` — rows are ``(temporal, height, width)`` position indices. :param cos_sin_cache: Precomputed cache, shape ``(max_pos, head_dim)``. The first ``head_dim // 2`` columns are cosine values and the remaining columns are sine values, each for frequencies ``0, 1, ..., head_dim // 2 - 1``. :param mrope_section: Three integers ``[s_t, s_h, s_w]`` that partition the ``head_dim // 2`` rotary frequency dimensions among the temporal, height, and width components. ``sum(mrope_section)`` must equal ``head_dim // 2``. :param mrope_interleaved: When ``True`` (Qwen3-VL default), uses the interleaved layout where frequency dimensions are cycled ``(t, h, w, t, h, w, ...)`` rather than grouped consecutively. :returns: ``(q_rope, k_rope)`` with the same shapes as the inputs.