Roadmap & Help wanted!¶
August - October 2025¶
P0¶
Benchmarks¶
Benchmark MLLM, llama.cpp, mnn.
W4A32 & PPL
Qwen3
Qwen2.5VL
Model Supports¶
Transform models supported by v1 to v2.
Qwen3 Series
Qwen2 Series
Llama3 Series
TinyLlama
Performance Optimization¶
Using 1. Manually memory planning 2. Fused kernels 3. Inplace Operators etc. To archive high performance in eager mode.
Inplace kernels for all backends
MulbyConst
AddFrom
Activation Functions
Sigmoid
GeLU
QuickGeLU
✅ SiLU
ReLU, ReLU2
LayerNorm
RMSNorm
Softmax
Fused Kernels
Softmax + TopK
Matmul + RoPE
Softmax + Causal Mask
Well optimized models (modeling_xxx_fast version)
Using Fused Kernels
Using inplace operators
Manually free tensors before its lifetime ends
!!! Kernel Selector Table (Tune)
GEMV and GEMM kernels tile size
Thread numbers
Quantized KVCache
MllmBlas used in Qwen2.5-VL is slow, use ggml’s matmul(llama file) in the feature.
Arm Kernel support¶
✅ MLLM-BLAS fp32 GEMM Kernels (transpose_a=False, transpose_b=True) [@chenghua]
Element-wise Kernels has slightly performance issues
✅ Arm I8-Gemm and I8-Gemv Kernels. (Co-works with bitspack) [@chenghua]
Arm U1-7 Group Quantized Embedding Kernels. (Co-works with bitspack)
More KleidiAI Kernels (SME Supports)
Optimizing MLLM-BLAS-SGEMV and MLLM-BLAS-SGEMM Kernels, for Shapes in LLM Scenarios.
Full coverage of the correctness of current Arm operators
MXFP4 Linear Kernels
✅ Paged Attention Kernels (Attentions as one of outputs)
X86 Backend support¶
Highway kernels for dbg purpose
QNN Backend support¶
Migration from mllmv1 to mllm v2
QNN Kernel Benchmarks
CANN Backend support¶
CANN Kernels
Quantization¶
Model Convertor & Quantizer
Shared weight Embedding(For tie-embedding scenario).
Applications & Productions¶
Multi-turn Chat
mllm-cli’s modelscope integration
P1¶
pymllm API¶
C++ Tensor and Python Tensor lifetime conflict in some test cases.
Tests¶
PPL Tests
Long term 2025¶
P1¶
FFI ABI¶
One C_api for all languages(Using tvm-ffi, thanks @tianqi)
ARM PMU Tools Workflow¶
A Kernel Benchmark workflow that using PMU in ARM Arch.
Software Pipeline & multi-issue will be benefited.