MLLM IR ======= MLLM IR (Intermediate Representation) is a multi-level intermediate representation designed for the MLLM framework. It provides a structured way to represent machine learning models and operations at different abstraction levels, enabling efficient compilation and execution. Basic IR Types -------------- MLLM IR consists of several levels, each serving specific purposes in the compilation pipeline: 1. Tensor IR ^^^^^^^^^^^^ Tensor IR represents the lowest level of abstraction, focusing on tensor operations and memory management. It handles fundamental tensor operations and memory allocation/deallocation. **Operations (Ops):** - ``RegisterOp``: Registers a tensor in the IR context, typically for parameter tensors or global tensors - ``AllocOp``: Allocates memory for a tensor during execution - ``FreeOp``: Frees previously allocated memory for a tensor **Values:** - ``TensorValue``: Represents a tensor with its shape, data type, and device information 2. Linalg IR ^^^^^^^^^^^^ Linalg IR represents linear algebra operations commonly found in neural networks. These operations are closer to actual ML computations. **Operations (Ops):** - Arithmetic operations: ``AddOp``, ``SubOp``, ``MulOp``, ``DivOp``, ``NegOp`` - Matrix operations: ``MatMulOp`` - Neural network operations: ``EmbeddingOp``, ``LinearOp``, ``RoPEOp``, ``KVCacheOp``, ``CausalMaskOp``, ``SoftmaxOp`` - Normalization operations: ``RMSNormOp``, ``LayerNormOp`` - Activation functions: ``SiLUOp``, ``GELUOp``, ``QuickGELUOp``, ``ReLUOp`` - Data manipulation: ``TransposeOp``, ``PermuteOp``, ``ViewOp``, ``ReshapeOp``, ``SplitOp``, ``ConcatOp``, ``RepeatOp``, ``CastTypeOp``, ``ContiguousOp`` - Memory operations: ``CopyOp``, ``CloneOp`` - Reduction operations: ``ReduceMaxOp``, ``ReduceMinOp``, ``ReduceSumOp`` - Convolution operations: ``Conv1DOp``, ``Conv2DOp``, ``Conv3DOp`` - Attention operations: ``FlashAttention2Op`` **Values:** - Operations work with ``TensorValue`` instances from the Tensor IR level 3. Graph IR ^^^^^^^^^^^ Graph IR represents higher-level operations and subgraphs, typically corresponding to neural network layers or modules. **Operations (Ops):** - ``SubGraphOp``: Represents a subgraph or module within the computation graph - ``CallGraphOp``: Represents a call to another graph/subgraph **Values:** - Graph IR operates on tensor values passed between subgraphs 4. Program IR ^^^^^^^^^^^^^ Program IR represents the executable program structure, including control flow and program fragments. **Operations (Ops):** - ``InstructionOp``: Represents one or more executable instructions that can be fused - ``FragmentOp``: Represents a program fragment (code, data, or text) - ``JumpOp``: Represents a jump/branch operation - ``LabelOp``: Represents a label/target for jumps **Values:** - Program IR works with tensor values flowing through program instructions IR Structure Overview --------------------- Each IR level follows a similar structure with nodes, operations, and values: - **Nodes**: The basic building blocks of the IR - **Operations (Ops)**: Computational or structural operations that transform values - **Values**: Data that flows between operations (typically tensors) Each operation can have inputs and outputs, forming a computational graph. The IR also supports attributes and metadata for operations and values. The multi-level design allows MLLM to perform optimizations at different abstraction levels, from high-level graph transformations to low-level kernel optimizations.