Introduction

mllm is a fast and lightweight multimodal LLM inference engine for mobile and edge devices.

Plain C/C++ implementation without dependencies
Optimized for multimodal LLMs like fuyu-8B
Supported: ARM NEON and x86 AVX2
4-bit and 6-bit integer quantization

Give it a try

mllm provides a series of example programs, including the implementation of llama, clip, fuyu, vit, imagebind, and more using the mllm framework.

In addition, mllm also offers an example app for Android devices, where you can upload models to your phone via adb to experience the effects of different models’ inference on mllm.

demo of UI screen understanding	demo of image understanding	demo of LLM chatting