pymllm.models.qwen3¶
Inference-only Qwen3 text model for pymllm.
Implements Qwen3ForCausalLM with: - QK-norm attention + 1D RoPE - RadixAttention KV-cache backend - Optional quantized Linear methods via quant_config
Adapted from pymllm’s Qwen3-VL text backbone and SGLang’s qwen3.py.
Attributes¶
Classes¶
Qwen3 attention with QK norm + 1D RoPE. |
|
Single Qwen3 decoder layer. |
|
Qwen3 text backbone (embedding + decoder + final norm). |
|
Inference-only Qwen3ForCausalLM. |
Module Contents¶
- pymllm.models.qwen3.logger¶
- class pymllm.models.qwen3.Qwen3Attention(hidden_size, num_heads, num_kv_heads, head_dim, layer_id, rope_theta=1000000.0, rms_norm_eps=1e-06, max_position_embeddings=32768, attention_bias=False, quant_config=None, prefix='')¶
Bases:
torch.nn.ModuleQwen3 attention with QK norm + 1D RoPE.
- Parameters:
hidden_size (int)
num_heads (int)
num_kv_heads (int)
head_dim (int)
layer_id (int)
rope_theta (float)
rms_norm_eps (float)
max_position_embeddings (int)
attention_bias (bool)
prefix (str)
- num_heads¶
- num_kv_heads¶
- head_dim¶
- q_size¶
- kv_size¶
- scaling¶
- rope_theta = 1000000.0¶
- use_fused_qkv = True¶
- o_proj¶
- q_norm¶
- k_norm¶
- attn¶
- forward(positions, hidden_states, forward_batch)¶
- Parameters:
positions (torch.Tensor)
hidden_states (torch.Tensor)
- Return type:
torch.Tensor
- class pymllm.models.qwen3.Qwen3DecoderLayer(hidden_size, num_heads, num_kv_heads, head_dim, intermediate_size, hidden_act, attention_bias, layer_id, rope_theta=1000000.0, rms_norm_eps=1e-06, max_position_embeddings=32768, quant_config=None, prefix='')¶
Bases:
torch.nn.ModuleSingle Qwen3 decoder layer.
- Parameters:
hidden_size (int)
num_heads (int)
num_kv_heads (int)
head_dim (int)
intermediate_size (int)
hidden_act (str)
attention_bias (bool)
layer_id (int)
rope_theta (float)
rms_norm_eps (float)
max_position_embeddings (int)
prefix (str)
- self_attn¶
- mlp¶
- input_layernorm¶
- post_attention_layernorm¶
- forward(positions, hidden_states, forward_batch, residual=None)¶
- Parameters:
positions (torch.Tensor)
hidden_states (torch.Tensor)
residual (torch.Tensor | None)
- Return type:
tuple[torch.Tensor, torch.Tensor]
- class pymllm.models.qwen3.Qwen3Model(config, quant_config=None)¶
Bases:
torch.nn.ModuleQwen3 text backbone (embedding + decoder + final norm).
- embed_tokens¶
- layers¶
- norm¶
- forward(input_ids, positions, forward_batch, input_embeds=None)¶
- Parameters:
input_ids (torch.Tensor)
positions (torch.Tensor)
input_embeds (torch.Tensor | None)
- Return type:
torch.Tensor
- class pymllm.models.qwen3.Qwen3ForCausalLM(config, quant_config=None)¶
Bases:
torch.nn.ModuleInference-only Qwen3ForCausalLM.
- config¶
- quant_config = None¶
- model¶
- get_input_embeddings()¶
- Return type:
torch.nn.Module
- forward(input_ids, positions, forward_batch)¶
- Parameters:
input_ids (torch.Tensor)
positions (torch.Tensor)
- load_weights(weights)¶
- Parameters:
weights (Iterable[Tuple[str, torch.Tensor]])
- Return type:
None