pymllm.models.qwen3
===================

.. py:module:: pymllm.models.qwen3

.. autoapi-nested-parse::

   Inference-only Qwen3 text model for pymllm.

   Implements Qwen3ForCausalLM with:
   - QK-norm attention + 1D RoPE
   - RadixAttention KV-cache backend
   - Optional quantized Linear methods via quant_config

   Adapted from pymllm's Qwen3-VL text backbone and SGLang's qwen3.py.


Attributes
----------

.. autoapisummary::

   pymllm.models.qwen3.logger


Classes
-------

.. autoapisummary::

   pymllm.models.qwen3.Qwen3Attention
   pymllm.models.qwen3.Qwen3DecoderLayer
   pymllm.models.qwen3.Qwen3Model
   pymllm.models.qwen3.Qwen3ForCausalLM


Module Contents
---------------

.. py:data:: logger

.. py:class:: Qwen3Attention(hidden_size, num_heads, num_kv_heads, head_dim, layer_id, rope_theta = 1000000.0, rms_norm_eps = 1e-06, max_position_embeddings = 32768, attention_bias = False, quant_config=None, prefix = '')

   Bases: :py:obj:`torch.nn.Module`


   Qwen3 attention with QK norm + 1D RoPE.


   .. py:attribute:: num_heads


   .. py:attribute:: num_kv_heads


   .. py:attribute:: head_dim


   .. py:attribute:: q_size


   .. py:attribute:: kv_size


   .. py:attribute:: scaling


   .. py:attribute:: rope_theta
      :value: 1000000.0


   .. py:attribute:: use_fused_qkv
      :value: True


   .. py:attribute:: o_proj


   .. py:attribute:: q_norm


   .. py:attribute:: k_norm


   .. py:attribute:: attn


   .. py:method:: forward(positions, hidden_states, forward_batch)


.. py:class:: Qwen3DecoderLayer(hidden_size, num_heads, num_kv_heads, head_dim, intermediate_size, hidden_act, attention_bias, layer_id, rope_theta = 1000000.0, rms_norm_eps = 1e-06, max_position_embeddings = 32768, quant_config=None, prefix = '')

   Bases: :py:obj:`torch.nn.Module`


   Single Qwen3 decoder layer.


   .. py:attribute:: self_attn


   .. py:attribute:: mlp


   .. py:attribute:: input_layernorm


   .. py:attribute:: post_attention_layernorm


   .. py:method:: forward(positions, hidden_states, forward_batch, residual = None)


.. py:class:: Qwen3Model(config, quant_config=None)

   Bases: :py:obj:`torch.nn.Module`


   Qwen3 text backbone (embedding + decoder + final norm).


   .. py:attribute:: hidden_size


   .. py:attribute:: num_hidden_layers


   .. py:attribute:: embed_tokens


   .. py:attribute:: layers


   .. py:attribute:: norm


   .. py:method:: forward(input_ids, positions, forward_batch, input_embeds = None)


.. py:class:: Qwen3ForCausalLM(config, quant_config=None)

   Bases: :py:obj:`torch.nn.Module`


   Inference-only Qwen3ForCausalLM.


   .. py:attribute:: config


   .. py:attribute:: quant_config
      :value: None


   .. py:attribute:: model


   .. py:method:: get_input_embeddings()


   .. py:method:: forward(input_ids, positions, forward_batch)


   .. py:method:: load_weights(weights)