Chat LLMs🤖
with

Just On Your Device!

mllm is a fast multimodal LLM inference engine for mobile and edge devices.

Get started Learn more

Lightweight

Design For Edge/Mobile Scenario

Multimodal

Supported For Multimodal including Vision / Text ...

In Your Packet

Easily Integrated in Apps via JNI/Libs

Focusing on Edge Scenario

Target at LLM

With optimization including Quantization, Memory Reuse, and Parallelization, we are able to achieve affordable inference latency of LLMs on the edge devices.

Quantization

Speed up the inference with FP16/8Bit/6Bit Quantization.

Accelerator Support

Chasing the state-of-the-art speed with AI Accelerator like Qualcomm Hexagon.

Multimodal Support

Support multimodal input like text, image, audio, and video.

More features

Visit Our Project Repo!

Get Started now

Embrace your private local LLM Assistant!

Get Started More about

Chat LLMs🤖 with Just On Your Device!