Chat LLMs🤖
with
Just On Your Device!
mllm is a fast multimodal LLM inference engine for mobile and edge devices.
Lightweight
Design For Edge/Mobile Scenario
Multimodal
Supported For Multimodal including Vision / Text ...
In Your Packet
Easily Integrated in Apps via JNI/Libs
Focusing on Edge Scenario
Target at LLM
With optimization including Quantization, Memory Reuse, and Parallelization, we are able to achieve affordable inference latency of LLMs on the edge devices.
Quantization
Speed up the inference with FP16/8Bit/6Bit Quantization.
Accelerator Support
Chasing the state-of-the-art speed with AI Accelerator like Qualcomm Hexagon.
Multimodal Support
Support multimodal input like text, image, audio, and video.
More features
Visit Our Project Repo!