Data Types in MLLM¶

MLLM supports a wide range of data types and offers robust compatibility with common formats such as GGUF and those used in KleidiAI. The data types supported by MLLM are listed in the table below:

Enum Constant	Value	Version	Description
kFloat32	0	V1	32-bit floating point
kFloat16	1	V1	16-bit floating point
kGGUF_Q4_0	2	V1	GGUF 4-bit quantization (block size 32, 1 scale value)
kGGUF_Q4_1	3	V1	GGUF 4-bit quantization (block size 32, scale + delta)
kGGUF_Q8_0	8	V1	GGUF 8-bit quantization (block size 32, 1 scale value)
kGGUF_Q8_1	9	V1	GGUF 8-bit quantization (block size 32, scale + delta)
kGGUF_Q8_Pertensor	10	V1	GGUF 8-bit per-tensor quantization
kGGUF_Q4_K	12	V1	GGUF mixed 4/5-bit quantization (block size 256)
kGGUF_Q6_K	14	V1	GGUF mixed 6/8-bit quantization (block size 256)
kGGUF_Q8_K	15	V1	GGUF 8-bit quantization (block size 256)
kInt8	16	V1	8-bit signed integer
kInt16	17	V1	16-bit signed integer
kInt32	18	V1	32-bit signed integer
kGGUF_Q4_0_4_4	19	V1	GGUF Q4_0 variant (4x4 block)
kGGUF_Q4_0_4_8	20	V1	GGUF Q4_0 variant (4x8 block)
kGGUF_Q4_0_8_8	21	V1	GGUF Q4_0 variant (8x8 block)
kGGUF_Q8_0_4_4	22	V1	GGUF Q8_0 variant (4x4 block)
kGGUF_Q3_K	23	V1	GGUF mixed 3-bit quantization
kGGUF_Q2_K	24	V1	GGUF mixed 2-bit quantization
kGGUF_Q1_K	25	V1	GGUF 1-bit quantization
kGGUF_IQ2_XXS	26	V1	GGUF 2.06 bpw IQ2 quantization (extremely small)
kGGUF_IQ2_XS	27	V1	GGUF 2.31 bpw IQ2 quantization (extra small)
kGGUF_IQ1_S	28	V1	GGUF 1.56 bpw IQ1 quantization (small)
kGGUF_IQ1_M	29	V1	GGUF 1.75 bpw IQ1 quantization (medium)
kGGUF_IQ2_S	30	V1	GGUF 2.5 bpw IQ2 quantization (small)
kBFloat16	128	V2	16-bit Brain floating point
kUInt8	129	V2	8-bit unsigned integer
kUInt16	130	V2	16-bit unsigned integer
kUInt32	131	V2	32-bit unsigned integer
kInt64	132	V2	64-bit signed integer
kUInt64	133	V2	64-bit unsigned integer
kByte	134	V2	Raw byte storage (custom quantized weights/packed KleidiAI data)

Important

Version means the version of MLLM model file.

MLLM native data types¶

GGUF¶

KleidiAI¶

MXFP4¶

MXFP4 [1] is defined as a 4-bit floating-point type, comprising 1 sign bit (S), 2 exponent bits (E), and 1 mantissa bit (M), denoted as E2M1. It supports a numerical range of [-6, 6] and does not represent infinity (inf) or Not a Number (NaN). A group size of 32 is used, meaning each group of 32 numbers shares a common scale factor. This scale is stored using 8 bits, all of which are exponent bits (E8M0), covering a range from [2^-127, 2^127]. The table below lists the values representable by MXFP4, excluding the scale.

Binary encoding	Value
0000	+0.0
0001	0.5
0010	1.0
0011	1.5
0100	2.0
0101	3.0
0110	4.0
0111	6.0
1000	-0.0
1001	-0.5
1010	-1.0
1011	-1.5
1100	-2.0
1101	-3.0
1110	-4.0
1111	-6.0

Reference

References