pymllm.models.qwen3_5
=====================

.. py:module:: pymllm.models.qwen3_5

.. autoapi-nested-parse::

   Inference-only Qwen3.5 model for pymllm.

   Implements the hybrid attention architecture:
   - **Full attention layers** (standard transformer with RoPE + output gate)
   - **GDN linear attention layers** (Gated Delta Network, O(n) complexity)

   Layers alternate: linear, attention, linear, attention, ... based on
   ``full_attention_interval`` in the config.

   Supports:
   - Dense (non-MoE) variant
   - Vision-Language (multimodal) via inheritance from Qwen3VL

   Adapted from sglang's ``qwen3_5.py``.


Attributes
----------

.. autoapisummary::

   pymllm.models.qwen3_5.logger


Classes
-------

.. autoapisummary::

   pymllm.models.qwen3_5.Qwen3_5FullAttention
   pymllm.models.qwen3_5.Qwen3_5AttentionDecoderLayer
   pymllm.models.qwen3_5.Qwen3_5LinearDecoderLayer
   pymllm.models.qwen3_5.Qwen3_5ForCausalLM
   pymllm.models.qwen3_5.Qwen3_5ForConditionalGeneration


Module Contents
---------------

.. py:data:: logger

.. py:class:: Qwen3_5FullAttention(config, layer_id, quant_config=None, prefix = '')

   Bases: :py:obj:`torch.nn.Module`


   Standard multi-head attention with RoPE, QK-norm, and optional output gate.


   .. py:attribute:: hidden_size


   .. py:attribute:: num_heads


   .. py:attribute:: num_kv_heads


   .. py:attribute:: head_dim


   .. py:attribute:: q_size


   .. py:attribute:: kv_size


   .. py:attribute:: scaling


   .. py:attribute:: layer_id


   .. py:attribute:: attn_output_gate


   .. py:attribute:: q_proj


   .. py:attribute:: k_proj


   .. py:attribute:: v_proj


   .. py:attribute:: o_proj


   .. py:attribute:: q_norm


   .. py:attribute:: k_norm


   .. py:attribute:: partial_rotary_factor


   .. py:attribute:: rope_theta


   .. py:attribute:: rotary_dim


   .. py:attribute:: attn


   .. py:method:: forward(positions, hidden_states, forward_batch)


.. py:class:: Qwen3_5AttentionDecoderLayer(config, layer_id, quant_config=None, prefix = '')

   Bases: :py:obj:`torch.nn.Module`


   Decoder layer with full attention + MLP.


   .. py:attribute:: self_attn


   .. py:attribute:: mlp


   .. py:attribute:: input_layernorm


   .. py:attribute:: post_attention_layernorm


   .. py:method:: forward(positions, hidden_states, residual, forward_batch)


.. py:class:: Qwen3_5LinearDecoderLayer(config, layer_id, gdn_layer_idx = 0, quant_config=None, prefix = '')

   Bases: :py:obj:`torch.nn.Module`


   Decoder layer with GDN linear attention + MLP.


   .. py:attribute:: linear_attn


   .. py:attribute:: mlp


   .. py:attribute:: input_layernorm


   .. py:attribute:: post_attention_layernorm


   .. py:method:: forward(positions, hidden_states, residual, forward_batch)


.. py:class:: Qwen3_5ForCausalLM(config, quant_config=None)

   Bases: :py:obj:`torch.nn.Module`


   Qwen3.5 causal language model with hybrid attention.

   Alternates between full attention and GDN linear attention layers.
   Dense (non-MoE) variant.


   .. py:attribute:: config


   .. py:attribute:: quant_config
      :value: None


   .. py:attribute:: hidden_size


   .. py:attribute:: vocab_size


   .. py:attribute:: embed_tokens


   .. py:attribute:: layer_types


   .. py:attribute:: layers


   .. py:attribute:: full_attn_layer_ids


   .. py:attribute:: num_gdn_layers
      :value: 0


   .. py:attribute:: norm


   .. py:method:: forward(input_ids, positions, forward_batch, input_embeds = None)


   .. py:method:: load_weights(weights)

      Load HuggingFace checkpoint weights with name remapping.


.. py:class:: Qwen3_5ForConditionalGeneration(config, quant_config=None)

   Bases: :py:obj:`torch.nn.Module`


   Qwen3.5 multimodal model (text + vision).

   Inherits vision encoder from Qwen3VL and uses Qwen3.5's hybrid
   language model.


   .. py:attribute:: config


   .. py:attribute:: quant_config
      :value: None


   .. py:attribute:: model


   .. py:attribute:: num_gdn_layers
      :value: 0


   .. py:attribute:: full_attn_layer_ids


   .. py:attribute:: lm_head


   .. py:attribute:: image_token_id


   .. py:attribute:: video_token_id


   .. py:method:: forward(input_ids, positions, forward_batch, input_embeds = None, pixel_values = None, image_grid_thw = None)


   .. py:method:: load_weights(weights)

      Load weights, dispatching visual vs language params.