Introduction
mllm is a fast and lightweight multimodal LLM inference engine for mobile and edge devices.
- Plain C/C++ implementation without dependencies
- Optimized for multimodal LLMs like fuyu-8B
- Supported: ARM NEON and x86 AVX2
- 4-bit and 6-bit integer quantization
Give it a try
mllm provides a series of example programs, including the implementation of llama, clip, fuyu, vit, imagebind, and more using the mllm framework.
In addition, mllm also offers an example app for Android devices, where you can upload models to your phone via adb to experience the effects of different models’ inference on mllm.
| demo of UI screen understanding | demo of image understanding | demo of LLM chatting |