MLLM IR

MLLM IR (Intermediate Representation) is a multi-level intermediate representation designed for the MLLM framework. It provides a structured way to represent machine learning models and operations at different abstraction levels, enabling efficient compilation and execution.

Basic IR Types

MLLM IR consists of several levels, each serving specific purposes in the compilation pipeline:

1. Tensor IR

Tensor IR represents the lowest level of abstraction, focusing on tensor operations and memory management. It handles fundamental tensor operations and memory allocation/deallocation.

Operations (Ops):

  • RegisterOp: Registers a tensor in the IR context, typically for parameter tensors or global tensors

  • AllocOp: Allocates memory for a tensor during execution

  • FreeOp: Frees previously allocated memory for a tensor

Values:

  • TensorValue: Represents a tensor with its shape, data type, and device information

2. Linalg IR

Linalg IR represents linear algebra operations commonly found in neural networks. These operations are closer to actual ML computations.

Operations (Ops):

  • Arithmetic operations: AddOp, SubOp, MulOp, DivOp, NegOp

  • Matrix operations: MatMulOp

  • Neural network operations: EmbeddingOp, LinearOp, RoPEOp, KVCacheOp, CausalMaskOp, SoftmaxOp

  • Normalization operations: RMSNormOp, LayerNormOp

  • Activation functions: SiLUOp, GELUOp, QuickGELUOp, ReLUOp

  • Data manipulation: TransposeOp, PermuteOp, ViewOp, ReshapeOp, SplitOp, ConcatOp, RepeatOp, CastTypeOp, ContiguousOp

  • Memory operations: CopyOp, CloneOp

  • Reduction operations: ReduceMaxOp, ReduceMinOp, ReduceSumOp

  • Convolution operations: Conv1DOp, Conv2DOp, Conv3DOp

  • Attention operations: FlashAttention2Op

Values:

  • Operations work with TensorValue instances from the Tensor IR level

3. Graph IR

Graph IR represents higher-level operations and subgraphs, typically corresponding to neural network layers or modules.

Operations (Ops):

  • SubGraphOp: Represents a subgraph or module within the computation graph

  • CallGraphOp: Represents a call to another graph/subgraph

Values:

  • Graph IR operates on tensor values passed between subgraphs

4. Program IR

Program IR represents the executable program structure, including control flow and program fragments.

Operations (Ops):

  • InstructionOp: Represents one or more executable instructions that can be fused

  • FragmentOp: Represents a program fragment (code, data, or text)

  • JumpOp: Represents a jump/branch operation

  • LabelOp: Represents a label/target for jumps

Values:

  • Program IR works with tensor values flowing through program instructions

IR Structure Overview

Each IR level follows a similar structure with nodes, operations, and values:

  • Nodes: The basic building blocks of the IR

  • Operations (Ops): Computational or structural operations that transform values

  • Values: Data that flows between operations (typically tensors)

Each operation can have inputs and outputs, forming a computational graph. The IR also supports attributes and metadata for operations and values.

The multi-level design allows MLLM to perform optimizations at different abstraction levels, from high-level graph transformations to low-level kernel optimizations.