pymllm.layers.rope
==================

.. py:module:: pymllm.layers.rope


Functions
---------

.. autoapisummary::

   pymllm.layers.rope.apply_rope
   pymllm.layers.rope.apply_llama31_rope
   pymllm.layers.rope.apply_rope_pos_ids
   pymllm.layers.rope.apply_llama31_rope_pos_ids
   pymllm.layers.rope.apply_rope_with_cos_sin_cache
   pymllm.layers.rope.apply_mrope


Module Contents
---------------

.. py:function:: apply_rope(q, k, indptr, offsets, inplace = False, rotary_dim = None, interleave = False, rope_scale = 1.0, rope_theta = 10000.0)

   Apply rotary embedding to a batch of queries/keys (stored as RaggedTensor).

   cos/sin values are computed on the fly inside the kernel. Position offsets
   are provided per-segment via ``indptr`` and ``offsets``.

   :param q: Query ragged tensor, shape ``(nnz, num_q_heads, head_dim)``.
   :param k: Key ragged tensor, shape ``(nnz, num_k_heads, head_dim)``.
   :param indptr: Indptr tensor, shape ``(batch_size + 1,)``. The i-th segment
                  spans ``q[indptr[i]:indptr[i+1]]``.
   :param offsets: Relative position offsets per segment, shape ``(batch_size,)``.
   :param inplace: If ``True``, apply RoPE in-place and return ``None``.
                   If ``False``, return new ``(q_rope, k_rope)`` tensors.
   :param rotary_dim: Number of dimensions to apply RoPE to.  ``None`` means
                      the entire ``head_dim``.
   :param interleave: If ``True``, rotate even/odd dims (``[..., ::2]`` /
                      ``[..., 1::2]``). If ``False``, rotate first/second half dims.
   :param rope_scale: Scaling factor for position indices.
   :param rope_theta: Base frequency theta.

   :returns: ``None`` when *inplace* is ``True``, otherwise a tuple
             ``(q_rope, k_rope)`` of rotated tensors with the same shapes as
             the inputs.


.. py:function:: apply_llama31_rope(q, k, indptr, offsets, inplace = False, rotary_dim = None, interleave = False, rope_scale = 8.0, rope_theta = 500000.0, low_freq_factor = 1.0, high_freq_factor = 4.0, old_context_len = 8192)

   Apply Llama 3.1 style rotary embedding to a batch of queries/keys.

   This variant adjusts frequencies with ``low_freq_factor``,
   ``high_freq_factor``, and ``old_context_len`` following the Llama 3.1
   RoPE recipe. cos/sin values are computed on the fly.

   :param q: Query ragged tensor, shape ``(nnz, num_q_heads, head_dim)``.
   :param k: Key ragged tensor, shape ``(nnz, num_k_heads, head_dim)``.
   :param indptr: Indptr tensor, shape ``(batch_size + 1,)``.
   :param offsets: Relative position offsets per segment, shape ``(batch_size,)``.
   :param inplace: If ``True``, apply in-place and return ``None``.
   :param rotary_dim: Number of dimensions to apply RoPE to. ``None`` means
                      the entire ``head_dim``.
   :param interleave: If ``True``, rotate even/odd dims; otherwise first/second
                      half dims.
   :param rope_scale: Scaling factor for position indices (default ``8``).
   :param rope_theta: Base frequency theta (default ``5e5``).
   :param low_freq_factor: Low frequency factor for Llama 3.1 RoPE.
   :param high_freq_factor: High frequency factor for Llama 3.1 RoPE.
   :param old_context_len: Original context length for Llama 3.1 RoPE.

   :returns: ``None`` when *inplace* is ``True``, otherwise ``(q_rope, k_rope)``.


.. py:function:: apply_rope_pos_ids(q, k, pos_ids, inplace = False, rotary_dim = None, interleave = False, rope_scale = 1.0, rope_theta = 10000.0)

   Apply rotary embedding using explicit per-token position IDs.

   Unlike :func:`apply_rope` which derives positions from ``indptr`` /
   ``offsets``, this function takes a flat ``pos_ids`` tensor that supplies
   an explicit position for every token.

   :param q: Query tensor, shape ``(nnz, num_q_heads, head_dim)``.
   :param k: Key tensor, shape ``(nnz, num_k_heads, head_dim)``.
   :param pos_ids: Position indices, shape ``(nnz,)``.
   :param inplace: If ``True``, apply in-place and return ``None``.
   :param rotary_dim: Number of dimensions to apply RoPE to.
   :param interleave: Interleaved layout flag.
   :param rope_scale: Scaling factor for position indices.
   :param rope_theta: Base frequency theta.

   :returns: ``None`` when *inplace* is ``True``, otherwise ``(q_rope, k_rope)``.


.. py:function:: apply_llama31_rope_pos_ids(q, k, pos_ids, inplace = False, rotary_dim = None, interleave = False, rope_scale = 8.0, rope_theta = 500000.0, low_freq_factor = 1.0, high_freq_factor = 4.0, old_context_len = 8192)

   Apply Llama 3.1 style RoPE using explicit per-token position IDs.

   Combines Llama 3.1 frequency adjustments with explicit ``pos_ids``.

   :param q: Query tensor, shape ``(nnz, num_q_heads, head_dim)``.
   :param k: Key tensor, shape ``(nnz, num_k_heads, head_dim)``.
   :param pos_ids: Position indices, shape ``(nnz,)``.
   :param inplace: If ``True``, apply in-place and return ``None``.
   :param rotary_dim: Number of dimensions to apply RoPE to.
   :param interleave: Interleaved layout flag.
   :param rope_scale: Scaling factor (default ``8``).
   :param rope_theta: Base frequency theta (default ``5e5``).
   :param low_freq_factor: Low frequency factor for Llama 3.1 RoPE.
   :param high_freq_factor: High frequency factor for Llama 3.1 RoPE.
   :param old_context_len: Original context length for Llama 3.1 RoPE.

   :returns: ``None`` when *inplace* is ``True``, otherwise ``(q_rope, k_rope)``.


.. py:function:: apply_rope_with_cos_sin_cache(positions, query, key, head_size, cos_sin_cache, inplace = False, is_neox = True)

   Apply rotary embedding with precomputed cos/sin cache.

   Compatible with SGL/vLLM implementations. Note that ``query`` and ``key``
   use a **flattened** head layout ``(nnz, num_heads * head_size)`` instead
   of the 3-D layout used by the other ``apply_rope*`` functions.

   :param positions: Position indices, shape ``(nnz,)``.
   :param query: Query tensor, shape ``(nnz, num_q_heads * head_size)``.
   :param key: Key tensor, shape ``(nnz, num_k_heads * head_size)``.
   :param head_size: Size of each attention head.
   :param cos_sin_cache: Precomputed cos/sin tensor, shape
                         ``(max_seq_len, rotary_dim)``. The first half of ``rotary_dim``
                         stores cosine values, the second half stores sine values.
   :param inplace: If ``True``, apply in-place and return ``None``.
   :param is_neox: If ``True`` (default), use GPT-NeoX style (rotate
                   first/second half dims). If ``False``, use interleaved style
                   (rotate even/odd dims).

   :returns: ``None`` when *inplace* is ``True``, otherwise
             ``(query_out, key_out)`` with the same shapes as the inputs.


.. py:function:: apply_mrope(q, k, positions, cos_sin_cache, mrope_section, mrope_interleaved = True)

   Apply multi-dimensional rotary position embedding (M-RoPE).

   Used by Qwen3-VL which assigns independent (t, h, w) position indices to
   each token.  For text tokens all three indices are the same sequential
   value; for image tokens they follow the spatial grid layout.

   :param q: Query tensor, shape ``(T, num_q_heads, head_dim)``.
   :param k: Key tensor, shape ``(T, num_kv_heads, head_dim)``.
   :param positions: 3-D position IDs, shape ``(3, T)`` — rows are
                     ``(temporal, height, width)`` position indices.
   :param cos_sin_cache: Precomputed cache, shape ``(max_pos, head_dim)``.
                         The first ``head_dim // 2`` columns are cosine values and the
                         remaining columns are sine values, each for frequencies
                         ``0, 1, ..., head_dim // 2 - 1``.
   :param mrope_section: Three integers ``[s_t, s_h, s_w]`` that partition
                         the ``head_dim // 2`` rotary frequency dimensions among the
                         temporal, height, and width components.
                         ``sum(mrope_section)`` must equal ``head_dim // 2``.
   :param mrope_interleaved: When ``True`` (Qwen3-VL default), uses the
                             interleaved layout where frequency dimensions are cycled
                             ``(t, h, w, t, h, w, ...)`` rather than grouped consecutively.

   :returns: ``(q_rope, k_rope)`` with the same shapes as the inputs.