Chat LLMs🤖
with
Just On Your Device!
mllm is a fast multimodal LLM inference engine for mobile and edge devices.
Lightweight
Design For Edge/Mobile Scenario
Multimodal
Supported For Multimodal including Vision / Text ...
In Your Packet
Easily Integrated in Apps via JNI/Libs
Focusing on Edge Scenario
Target at LLM
With optimization including Quantization, Memory Reuse, and Parallelization, we are able to achieve affordable inference latency of LLMs on the edge devices.

Accelerator Support
Chasing the state-of-the-art speed with AI Accelerator like Qualcomm Hexagon.