Chat LLMs🤖

with

Just On Your Device!

mllm is a fast multimodal LLM inference engine for mobile and edge devices.

Focusing on Edge Scenario

Target at LLM

With optimization including Quantization, Memory Reuse, and Parallelization, we are able to achieve affordable inference latency of LLMs on the edge devices.

burger illustration
Quantization

Speed up the inference with FP16/8Bit/6Bit Quantization.

Read more
burger illustration
Accelerator Support

Chasing the state-of-the-art speed with AI Accelerator like Qualcomm Hexagon.

Read more
burger illustration
Multimodal Support

Support multimodal input like text, image, audio, and video.

Read more
burger illustration
More features

Visit Our Project Repo!

Read more

Get Started now

Embrace your private local LLM Assistant!

Star Us!