Neural Network Layers API¶
The layers directory contains implementations of various neural network layers that can be used to build models in MLLM. These layers are the building blocks for constructing neural networks.
#include "mllm/nn/Nn.hpp"
Linear Layer¶
-
class Linear¶
A fully connected linear layer that applies a linear transformation to the input data.
-
Linear::Linear(int32_t in_channels, int32_t out_channels, bool bias = true, aops::LinearImplTypes impl_type = aops::LinearImplTypes::kDefault)¶
Constructor with layer parameters.
- Parameters:
in_channels – Number of input features
out_channels – Number of output features
bias – Whether to include a bias term (default: true)
impl_type – Implementation type (default: kDefault)
-
Linear::Linear(const aops::LinearOpOptions &options)¶
Constructor with options.
- Parameters:
options – Linear operation options
-
Tensor Linear::weight() const¶
Get the weight tensor of the layer.
- Returns:
Weight tensor
-
Tensor Linear::bias() const¶
Get the bias tensor of the layer.
- Returns:
Bias tensor
-
Linear::Linear(int32_t in_channels, int32_t out_channels, bool bias = true, aops::LinearImplTypes impl_type = aops::LinearImplTypes::kDefault)¶
RMSNorm Layer¶
-
class RMSNorm¶
Root Mean Square Layer Normalization.
-
RMSNorm::RMSNorm(float epsilon, bool add_unit_offset = false)¶
Constructor with normalization parameters.
- Parameters:
epsilon – Small value added to the denominator for numerical stability (default: 1e-5)
add_unit_offset – Whether to add a unit offset (default: false)
-
RMSNorm::RMSNorm(const aops::RMSNormOpOptions &options)¶
Constructor with options.
- Parameters:
options – RMSNorm operation options
-
Tensor RMSNorm::weight() const¶
Get the weight tensor of the layer.
- Returns:
Weight tensor
-
RMSNorm::RMSNorm(float epsilon, bool add_unit_offset = false)¶
SiLU Layer¶
Embedding Layer¶
-
class Embedding¶
Embedding layer that maps indices to dense vectors.
-
Embedding::Embedding(const aops::EmbeddingOpOptions &options)¶
Constructor with options.
- Parameters:
options – Embedding operation options
-
Embedding::Embedding(int32_t vocab_size, int32_t hidden_size)¶
Constructor with vocabulary and hidden size.
- Parameters:
vocab_size – Size of the vocabulary
hidden_size – Dimension of each embedding vector
-
Tensor Embedding::weight() const¶
Get the embedding weight matrix.
- Returns:
Weight tensor of shape [vocab_size, hidden_size]
-
Embedding::Embedding(const aops::EmbeddingOpOptions &options)¶
GELU Layer¶
QuickGELU Layer¶
ReLU Layer¶
LayerNorm Layer¶
-
class LayerNorm¶
Layer Normalization.
-
LayerNorm::LayerNorm(const aops::LayerNormOpOptions &options)¶
Constructor with options.
- Parameters:
options – LayerNorm operation options
-
LayerNorm::LayerNorm(const std::vector<int32_t> &normalized_shape, bool elementwise_affine = true, bool bias = true, float eps = 1e-6)¶
Constructor with normalization parameters.
- Parameters:
normalized_shape – Shape of the normalized dimensions
elementwise_affine – Whether to use learnable affine parameters (default: true)
bias – Whether to include bias term (default: true)
eps – Small value added to the denominator for numerical stability (default: 1e-6)
-
LayerNorm::LayerNorm(const aops::LayerNormOpOptions &options)¶
Softmax Layer¶
VisionRoPE Layer¶
-
class VisionRoPE¶
Rotary Positional Encoding for vision tasks.
-
VisionRoPE::VisionRoPE()¶
Default constructor.
-
VisionRoPE::VisionRoPE(const aops::VisionRoPEOpOptions &Options)¶
Constructor with options.
- Parameters:
Options – VisionRoPE operation options
-
VisionRoPE::VisionRoPE(const aops::VisionRoPEOpOptionsType type, const aops::Qwen2VLRoPEOpOptions &Options)¶
Constructor with type and Qwen2VL options.
- Parameters:
type – Type of VisionRoPE operation
Options – Qwen2VL RoPE operation options
-
VisionRoPE::VisionRoPE()¶
Conv3D Layer¶
-
class Conv3D¶
3D Convolutional layer.
-
Conv3D::Conv3D(int32_t in_channels, int32_t out_channels, const std::vector<int32_t> &kernel_size, const std::vector<int32_t> &stride_size, bool bias = true, aops::Conv3DOpImplType impl_type = aops::Conv3DOpImplType::kDefault)¶
Constructor with convolution parameters.
- Parameters:
in_channels – Number of input channels
out_channels – Number of output channels
kernel_size – Size of the convolution kernel
stride_size – Stride of the convolution
bias – Whether to include a bias term (default: true)
impl_type – Implementation type (default: kDefault)
-
Conv3D::Conv3D(const aops::Conv3DOpOptions &options)¶
Constructor with options.
- Parameters:
options – Conv3D operation options
-
Tensor Conv3D::weight() const¶
Get the weight tensor of the layer.
- Returns:
Weight tensor
-
Tensor Conv3D::bias() const¶
Get the bias tensor of the layer.
- Returns:
Bias tensor
-
Conv3D::Conv3D(int32_t in_channels, int32_t out_channels, const std::vector<int32_t> &kernel_size, const std::vector<int32_t> &stride_size, bool bias = true, aops::Conv3DOpImplType impl_type = aops::Conv3DOpImplType::kDefault)¶
CausalMask Layer¶
-
class CausalMask¶
Causal (autoregressive) attention mask.
-
CausalMask::CausalMask()¶
Default constructor.
-
CausalMask::CausalMask(const aops::CausalMaskOpOptions &options)¶
Constructor with options.
- Parameters:
options – CausalMask operation options
-
CausalMask::CausalMask(bool sliding_window, int32_t window_size)¶
Constructor with sliding window parameters.
- Parameters:
sliding_window – Whether to use sliding window attention
window_size – Size of the sliding window
-
CausalMask::CausalMask()¶
MultimodalRoPE Layer¶
-
class MultimodalRoPE¶
Rotary Positional Encoding for multimodal tasks.
-
MultimodalRoPE::MultimodalRoPE()¶
Default constructor.
-
MultimodalRoPE::MultimodalRoPE(const aops::MultimodalRoPEOpOptions &options)¶
Constructor with options.
- Parameters:
options – MultimodalRoPE operation options
-
MultimodalRoPE::MultimodalRoPE(const aops::Qwen2VLMultimodalRoPEOpOptions &options)¶
Constructor with Qwen2VL multimodal options.
- Parameters:
options – Qwen2VL MultimodalRoPE operation options
-
MultimodalRoPE::MultimodalRoPE()¶
Param Layer¶
-
class Param¶
Parameter layer that holds trainable parameters.
-
Param::Param(const aops::ParamOpOptions &options)¶
Constructor with options.
- Parameters:
options – Param operation options
-
Param::Param(const std::string &name, const Tensor::shape_t &shape = {})¶
Constructor with name and shape.
- Parameters:
name – Name of the parameter
shape – Shape of the parameter tensor (default: empty)
-
Tensor Param::weight() const¶
Get the parameter tensor.
- Returns:
Weight tensor
-
Param::Param(const aops::ParamOpOptions &options)¶
KVCache Layer¶
-
class KVCache¶
Key-Value cache for autoregressive generation.
-
KVCache::KVCache(const aops::KVCacheOpOptions &options)¶
Constructor with options.
- Parameters:
options – KVCache operation options
-
KVCache::KVCache(int32_t layer_idx, int32_t q_head, int32_t kv_head, int32_t head_dim, bool use_fa2 = true)¶
Constructor with cache parameters.
- Parameters:
layer_idx – Layer index
q_head – Number of query heads
kv_head – Number of key/value heads
head_dim – Dimension of each head
use_fa2 – Whether to use FlashAttention-2 (default: true)
-
void KVCache::setLayerIndex(int32_t layer_idx)¶
Set the layer index.
- Parameters:
layer_idx – Layer index
-
KVCache::KVCache(const aops::KVCacheOpOptions &options)¶
STFT Layer¶
-
class STFT¶
Short-Time Fourier Transform layer for signal processing.
-
STFT::STFT(const aops::STFTOpOptions &options)¶
Constructor with options.
- Parameters:
options – STFT operation options
-
STFT::STFT(int n_fft, int hop_length, int win_length, bool onesided = true, bool center = false, const std::string &pad_mode = "constant", bool return_complex = false)¶
Constructor with STFT parameters.
- Parameters:
n_fft – Size of Fourier transform
hop_length – Distance between neighboring sliding window frames
win_length – Size of window frame
onesided – Whether to return only non-negative frequency bins (default: true)
center – Whether to pad input on both sides (default: false)
pad_mode – Padding mode (default: “constant”)
return_complex – Whether to return complex tensor (default: false)
-
STFT::STFT(const aops::STFTOpOptions &options)¶