pymllm.models.qwen3_5 ===================== .. py:module:: pymllm.models.qwen3_5 .. autoapi-nested-parse:: Inference-only Qwen3.5 model for pymllm. Implements the hybrid attention architecture: - **Full attention layers** (standard transformer with RoPE + output gate) - **GDN linear attention layers** (Gated Delta Network, O(n) complexity) Layers alternate: linear, attention, linear, attention, ... based on ``full_attention_interval`` in the config. Supports: - Dense (non-MoE) variant - Vision-Language (multimodal) via inheritance from Qwen3VL Adapted from sglang's ``qwen3_5.py``. Attributes ---------- .. autoapisummary:: pymllm.models.qwen3_5.logger Classes ------- .. autoapisummary:: pymllm.models.qwen3_5.Qwen3_5FullAttention pymllm.models.qwen3_5.Qwen3_5AttentionDecoderLayer pymllm.models.qwen3_5.Qwen3_5LinearDecoderLayer pymllm.models.qwen3_5.Qwen3_5ForCausalLM pymllm.models.qwen3_5.Qwen3_5ForConditionalGeneration Module Contents --------------- .. py:data:: logger .. py:class:: Qwen3_5FullAttention(config, layer_id, quant_config=None, prefix = '') Bases: :py:obj:`torch.nn.Module` Standard multi-head attention with RoPE, QK-norm, and optional output gate. .. py:attribute:: hidden_size .. py:attribute:: num_heads .. py:attribute:: num_kv_heads .. py:attribute:: head_dim .. py:attribute:: q_size .. py:attribute:: kv_size .. py:attribute:: scaling .. py:attribute:: layer_id .. py:attribute:: attn_output_gate .. py:attribute:: q_proj .. py:attribute:: k_proj .. py:attribute:: v_proj .. py:attribute:: o_proj .. py:attribute:: q_norm .. py:attribute:: k_norm .. py:attribute:: partial_rotary_factor .. py:attribute:: rope_theta .. py:attribute:: rotary_dim .. py:attribute:: attn .. py:method:: forward(positions, hidden_states, forward_batch) .. py:class:: Qwen3_5AttentionDecoderLayer(config, layer_id, quant_config=None, prefix = '') Bases: :py:obj:`torch.nn.Module` Decoder layer with full attention + MLP. .. py:attribute:: self_attn .. py:attribute:: mlp .. py:attribute:: input_layernorm .. py:attribute:: post_attention_layernorm .. py:method:: forward(positions, hidden_states, residual, forward_batch) .. py:class:: Qwen3_5LinearDecoderLayer(config, layer_id, gdn_layer_idx = 0, quant_config=None, prefix = '') Bases: :py:obj:`torch.nn.Module` Decoder layer with GDN linear attention + MLP. .. py:attribute:: linear_attn .. py:attribute:: mlp .. py:attribute:: input_layernorm .. py:attribute:: post_attention_layernorm .. py:method:: forward(positions, hidden_states, residual, forward_batch) .. py:class:: Qwen3_5ForCausalLM(config, quant_config=None) Bases: :py:obj:`torch.nn.Module` Qwen3.5 causal language model with hybrid attention. Alternates between full attention and GDN linear attention layers. Dense (non-MoE) variant. .. py:attribute:: config .. py:attribute:: quant_config :value: None .. py:attribute:: hidden_size .. py:attribute:: vocab_size .. py:attribute:: embed_tokens .. py:attribute:: layer_types .. py:attribute:: layers .. py:attribute:: full_attn_layer_ids .. py:attribute:: num_gdn_layers :value: 0 .. py:attribute:: norm .. py:method:: forward(input_ids, positions, forward_batch, input_embeds = None) .. py:method:: load_weights(weights) Load HuggingFace checkpoint weights with name remapping. .. py:class:: Qwen3_5ForConditionalGeneration(config, quant_config=None) Bases: :py:obj:`torch.nn.Module` Qwen3.5 multimodal model (text + vision). Inherits vision encoder from Qwen3VL and uses Qwen3.5's hybrid language model. .. py:attribute:: config .. py:attribute:: quant_config :value: None .. py:attribute:: model .. py:attribute:: num_gdn_layers :value: 0 .. py:attribute:: full_attn_layer_ids .. py:attribute:: lm_head .. py:attribute:: image_token_id .. py:attribute:: video_token_id .. py:method:: forward(input_ids, positions, forward_batch, input_embeds = None, pixel_values = None, image_grid_thw = None) .. py:method:: load_weights(weights) Load weights, dispatching visual vs language params.