pymllm.models.qwen3 =================== .. py:module:: pymllm.models.qwen3 .. autoapi-nested-parse:: Inference-only Qwen3 text model for pymllm. Implements Qwen3ForCausalLM with: - QK-norm attention + 1D RoPE - RadixAttention KV-cache backend - Optional quantized Linear methods via quant_config Adapted from pymllm's Qwen3-VL text backbone and SGLang's qwen3.py. Attributes ---------- .. autoapisummary:: pymllm.models.qwen3.logger Classes ------- .. autoapisummary:: pymllm.models.qwen3.Qwen3Attention pymllm.models.qwen3.Qwen3DecoderLayer pymllm.models.qwen3.Qwen3Model pymllm.models.qwen3.Qwen3ForCausalLM Module Contents --------------- .. py:data:: logger .. py:class:: Qwen3Attention(hidden_size, num_heads, num_kv_heads, head_dim, layer_id, rope_theta = 1000000.0, rms_norm_eps = 1e-06, max_position_embeddings = 32768, attention_bias = False, quant_config=None, prefix = '') Bases: :py:obj:`torch.nn.Module` Qwen3 attention with QK norm + 1D RoPE. .. py:attribute:: num_heads .. py:attribute:: num_kv_heads .. py:attribute:: head_dim .. py:attribute:: q_size .. py:attribute:: kv_size .. py:attribute:: scaling .. py:attribute:: rope_theta :value: 1000000.0 .. py:attribute:: use_fused_qkv :value: True .. py:attribute:: o_proj .. py:attribute:: q_norm .. py:attribute:: k_norm .. py:attribute:: attn .. py:method:: forward(positions, hidden_states, forward_batch) .. py:class:: Qwen3DecoderLayer(hidden_size, num_heads, num_kv_heads, head_dim, intermediate_size, hidden_act, attention_bias, layer_id, rope_theta = 1000000.0, rms_norm_eps = 1e-06, max_position_embeddings = 32768, quant_config=None, prefix = '') Bases: :py:obj:`torch.nn.Module` Single Qwen3 decoder layer. .. py:attribute:: self_attn .. py:attribute:: mlp .. py:attribute:: input_layernorm .. py:attribute:: post_attention_layernorm .. py:method:: forward(positions, hidden_states, forward_batch, residual = None) .. py:class:: Qwen3Model(config, quant_config=None) Bases: :py:obj:`torch.nn.Module` Qwen3 text backbone (embedding + decoder + final norm). .. py:attribute:: hidden_size .. py:attribute:: num_hidden_layers .. py:attribute:: embed_tokens .. py:attribute:: layers .. py:attribute:: norm .. py:method:: forward(input_ids, positions, forward_batch, input_embeds = None) .. py:class:: Qwen3ForCausalLM(config, quant_config=None) Bases: :py:obj:`torch.nn.Module` Inference-only Qwen3ForCausalLM. .. py:attribute:: config .. py:attribute:: quant_config :value: None .. py:attribute:: model .. py:method:: get_input_embeddings() .. py:method:: forward(input_ids, positions, forward_batch) .. py:method:: load_weights(weights)