Data Types in MLLM

MLLM supports a wide range of data types and offers robust compatibility with common formats such as GGUF and those used in KleidiAI. The data types supported by MLLM are listed in the table below:

Enum Constant

Value

Version

Description

kFloat32

0

V1

32-bit floating point

kFloat16

1

V1

16-bit floating point

kGGUF_Q4_0

2

V1

GGUF 4-bit quantization (block size 32, 1 scale value)

kGGUF_Q4_1

3

V1

GGUF 4-bit quantization (block size 32, scale + delta)

kGGUF_Q8_0

8

V1

GGUF 8-bit quantization (block size 32, 1 scale value)

kGGUF_Q8_1

9

V1

GGUF 8-bit quantization (block size 32, scale + delta)

kGGUF_Q8_Pertensor

10

V1

GGUF 8-bit per-tensor quantization

kGGUF_Q4_K

12

V1

GGUF mixed 4/5-bit quantization (block size 256)

kGGUF_Q6_K

14

V1

GGUF mixed 6/8-bit quantization (block size 256)

kGGUF_Q8_K

15

V1

GGUF 8-bit quantization (block size 256)

kInt8

16

V1

8-bit signed integer

kInt16

17

V1

16-bit signed integer

kInt32

18

V1

32-bit signed integer

kGGUF_Q4_0_4_4

19

V1

GGUF Q4_0 variant (4x4 block)

kGGUF_Q4_0_4_8

20

V1

GGUF Q4_0 variant (4x8 block)

kGGUF_Q4_0_8_8

21

V1

GGUF Q4_0 variant (8x8 block)

kGGUF_Q8_0_4_4

22

V1

GGUF Q8_0 variant (4x4 block)

kGGUF_Q3_K

23

V1

GGUF mixed 3-bit quantization

kGGUF_Q2_K

24

V1

GGUF mixed 2-bit quantization

kGGUF_Q1_K

25

V1

GGUF 1-bit quantization

kGGUF_IQ2_XXS

26

V1

GGUF 2.06 bpw IQ2 quantization (extremely small)

kGGUF_IQ2_XS

27

V1

GGUF 2.31 bpw IQ2 quantization (extra small)

kGGUF_IQ1_S

28

V1

GGUF 1.56 bpw IQ1 quantization (small)

kGGUF_IQ1_M

29

V1

GGUF 1.75 bpw IQ1 quantization (medium)

kGGUF_IQ2_S

30

V1

GGUF 2.5 bpw IQ2 quantization (small)

kBFloat16

128

V2

16-bit Brain floating point

kUInt8

129

V2

8-bit unsigned integer

kUInt16

130

V2

16-bit unsigned integer

kUInt32

131

V2

32-bit unsigned integer

kInt64

132

V2

64-bit signed integer

kUInt64

133

V2

64-bit unsigned integer

kByte

134

V2

Raw byte storage (custom quantized weights/packed KleidiAI data)

Important

  1. Version means the version of MLLM model file.

MLLM native data types

GGUF

KleidiAI

MXFP4

MXFP4 [1] is defined as a 4-bit floating-point type, comprising 1 sign bit (S), 2 exponent bits (E), and 1 mantissa bit (M), denoted as E2M1. It supports a numerical range of [-6, 6] and does not represent infinity (inf) or Not a Number (NaN). A group size of 32 is used, meaning each group of 32 numbers shares a common scale factor. This scale is stored using 8 bits, all of which are exponent bits (E8M0), covering a range from [2^-127, 2^127]. The table below lists the values representable by MXFP4, excluding the scale.

Binary encoding

Value

0000

+0.0

0001

0.5

0010

1.0

0011

1.5

0100

2.0

0101

3.0

0110

4.0

0111

6.0

1000

-0.0

1001

-0.5

1010

-1.0

1011

-1.5

1100

-2.0

1101

-3.0

1110

-4.0

1111

-6.0


Reference

References