Neural Network Layers API

The layers directory contains implementations of various neural network layers that can be used to build models in MLLM. These layers are the building blocks for constructing neural networks.

#include "mllm/nn/Nn.hpp"

Linear Layer

class Linear

A fully connected linear layer that applies a linear transformation to the input data.

Linear::Linear()

Default constructor.

Linear::Linear(int32_t in_channels, int32_t out_channels, bool bias = true, aops::LinearImplTypes impl_type = aops::LinearImplTypes::kDefault)

Constructor with layer parameters.

Parameters:
  • in_channels – Number of input features

  • out_channels – Number of output features

  • bias – Whether to include a bias term (default: true)

  • impl_type – Implementation type (default: kDefault)

Linear::Linear(const aops::LinearOpOptions &options)

Constructor with options.

Parameters:

options – Linear operation options

Tensor Linear::weight() const

Get the weight tensor of the layer.

Returns:

Weight tensor

Tensor Linear::bias() const

Get the bias tensor of the layer.

Returns:

Bias tensor

RMSNorm Layer

class RMSNorm

Root Mean Square Layer Normalization.

RMSNorm::RMSNorm()

Default constructor with epsilon=1e-5 and add_unit_offset=false.

RMSNorm::RMSNorm(float epsilon, bool add_unit_offset = false)

Constructor with normalization parameters.

Parameters:
  • epsilon – Small value added to the denominator for numerical stability (default: 1e-5)

  • add_unit_offset – Whether to add a unit offset (default: false)

RMSNorm::RMSNorm(const aops::RMSNormOpOptions &options)

Constructor with options.

Parameters:

options – RMSNorm operation options

Tensor RMSNorm::weight() const

Get the weight tensor of the layer.

Returns:

Weight tensor

SiLU Layer

class SiLU

Sigmoid Linear Unit activation function (also known as Swish).

SiLU::SiLU()

Default constructor.

SiLU::SiLU(const aops::SiLUOpOptions &options)

Constructor with options.

Parameters:

options – SiLU operation options

Embedding Layer

class Embedding

Embedding layer that maps indices to dense vectors.

Embedding::Embedding()

Default constructor.

Embedding::Embedding(const aops::EmbeddingOpOptions &options)

Constructor with options.

Parameters:

options – Embedding operation options

Embedding::Embedding(int32_t vocab_size, int32_t hidden_size)

Constructor with vocabulary and hidden size.

Parameters:
  • vocab_size – Size of the vocabulary

  • hidden_size – Dimension of each embedding vector

Tensor Embedding::weight() const

Get the embedding weight matrix.

Returns:

Weight tensor of shape [vocab_size, hidden_size]

GELU Layer

class GELU

Gaussian Error Linear Unit activation function.

GELU::GELU()

Default constructor.

GELU::GELU(const aops::GELUOpOptions &options)

Constructor with options.

Parameters:

options – GELU operation options

QuickGELU Layer

class QuickGELU

An approximation of GELU that is faster to compute.

QuickGELU::QuickGELU()

Default constructor.

QuickGELU::QuickGELU(const aops::QuickGELUOpOptions &options)

Constructor with options.

Parameters:

options – QuickGELU operation options

ReLU Layer

class ReLU

Rectified Linear Unit activation function.

ReLU::ReLU()

Default constructor.

ReLU::ReLU(const aops::ReLUOpOptions &options)

Constructor with options.

Parameters:

options – ReLU operation options

LayerNorm Layer

class LayerNorm

Layer Normalization.

LayerNorm::LayerNorm()

Default constructor.

LayerNorm::LayerNorm(const aops::LayerNormOpOptions &options)

Constructor with options.

Parameters:

options – LayerNorm operation options

LayerNorm::LayerNorm(const std::vector<int32_t> &normalized_shape, bool elementwise_affine = true, bool bias = true, float eps = 1e-6)

Constructor with normalization parameters.

Parameters:
  • normalized_shape – Shape of the normalized dimensions

  • elementwise_affine – Whether to use learnable affine parameters (default: true)

  • bias – Whether to include bias term (default: true)

  • eps – Small value added to the denominator for numerical stability (default: 1e-6)

Softmax Layer

class Softmax

Softmax activation function.

Softmax::Softmax()

Default constructor.

Softmax::Softmax(const aops::SoftmaxOpOptions &options)

Constructor with options.

Parameters:

options – Softmax operation options

Softmax::Softmax(int32_t dim)

Constructor with dimension parameter.

Parameters:

dim – Dimension along which to apply softmax

VisionRoPE Layer

class VisionRoPE

Rotary Positional Encoding for vision tasks.

VisionRoPE::VisionRoPE()

Default constructor.

VisionRoPE::VisionRoPE(const aops::VisionRoPEOpOptions &Options)

Constructor with options.

Parameters:

Options – VisionRoPE operation options

VisionRoPE::VisionRoPE(const aops::VisionRoPEOpOptionsType type, const aops::Qwen2VLRoPEOpOptions &Options)

Constructor with type and Qwen2VL options.

Parameters:
  • type – Type of VisionRoPE operation

  • Options – Qwen2VL RoPE operation options

Conv3D Layer

class Conv3D

3D Convolutional layer.

Conv3D::Conv3D()

Default constructor.

Conv3D::Conv3D(int32_t in_channels, int32_t out_channels, const std::vector<int32_t> &kernel_size, const std::vector<int32_t> &stride_size, bool bias = true, aops::Conv3DOpImplType impl_type = aops::Conv3DOpImplType::kDefault)

Constructor with convolution parameters.

Parameters:
  • in_channels – Number of input channels

  • out_channels – Number of output channels

  • kernel_size – Size of the convolution kernel

  • stride_size – Stride of the convolution

  • bias – Whether to include a bias term (default: true)

  • impl_type – Implementation type (default: kDefault)

Conv3D::Conv3D(const aops::Conv3DOpOptions &options)

Constructor with options.

Parameters:

options – Conv3D operation options

Tensor Conv3D::weight() const

Get the weight tensor of the layer.

Returns:

Weight tensor

Tensor Conv3D::bias() const

Get the bias tensor of the layer.

Returns:

Bias tensor

CausalMask Layer

class CausalMask

Causal (autoregressive) attention mask.

CausalMask::CausalMask()

Default constructor.

CausalMask::CausalMask(const aops::CausalMaskOpOptions &options)

Constructor with options.

Parameters:

options – CausalMask operation options

CausalMask::CausalMask(bool sliding_window, int32_t window_size)

Constructor with sliding window parameters.

Parameters:
  • sliding_window – Whether to use sliding window attention

  • window_size – Size of the sliding window

MultimodalRoPE Layer

class MultimodalRoPE

Rotary Positional Encoding for multimodal tasks.

MultimodalRoPE::MultimodalRoPE()

Default constructor.

MultimodalRoPE::MultimodalRoPE(const aops::MultimodalRoPEOpOptions &options)

Constructor with options.

Parameters:

options – MultimodalRoPE operation options

MultimodalRoPE::MultimodalRoPE(const aops::Qwen2VLMultimodalRoPEOpOptions &options)

Constructor with Qwen2VL multimodal options.

Parameters:

options – Qwen2VL MultimodalRoPE operation options

Param Layer

class Param

Parameter layer that holds trainable parameters.

Param::Param()

Default constructor.

Param::Param(const aops::ParamOpOptions &options)

Constructor with options.

Parameters:

options – Param operation options

Param::Param(const std::string &name, const Tensor::shape_t &shape = {})

Constructor with name and shape.

Parameters:
  • name – Name of the parameter

  • shape – Shape of the parameter tensor (default: empty)

Tensor Param::weight() const

Get the parameter tensor.

Returns:

Weight tensor

KVCache Layer

class KVCache

Key-Value cache for autoregressive generation.

KVCache::KVCache()

Default constructor.

KVCache::KVCache(const aops::KVCacheOpOptions &options)

Constructor with options.

Parameters:

options – KVCache operation options

KVCache::KVCache(int32_t layer_idx, int32_t q_head, int32_t kv_head, int32_t head_dim, bool use_fa2 = true)

Constructor with cache parameters.

Parameters:
  • layer_idx – Layer index

  • q_head – Number of query heads

  • kv_head – Number of key/value heads

  • head_dim – Dimension of each head

  • use_fa2 – Whether to use FlashAttention-2 (default: true)

void KVCache::setLayerIndex(int32_t layer_idx)

Set the layer index.

Parameters:

layer_idx – Layer index

STFT Layer

class STFT

Short-Time Fourier Transform layer for signal processing.

STFT::STFT()

Default constructor.

STFT::STFT(const aops::STFTOpOptions &options)

Constructor with options.

Parameters:

options – STFT operation options

STFT::STFT(int n_fft, int hop_length, int win_length, bool onesided = true, bool center = false, const std::string &pad_mode = "constant", bool return_complex = false)

Constructor with STFT parameters.

Parameters:
  • n_fft – Size of Fourier transform

  • hop_length – Distance between neighboring sliding window frames

  • win_length – Size of window frame

  • onesided – Whether to return only non-negative frequency bins (default: true)

  • center – Whether to pad input on both sides (default: false)

  • pad_mode – Padding mode (default: “constant”)

  • return_complex – Whether to return complex tensor (default: false)