pymllm.configs.quantization_config

Quantization settings for model weights and KV cache.

Classes

QuantizationConfig

Quantization configuration for weights and KV cache.

Module Contents

class pymllm.configs.quantization_config.QuantizationConfig

Quantization configuration for weights and KV cache.

method: str | None = None
kv_cache_dtype: Literal['auto', 'float16', 'bfloat16', 'fp8_e4m3', 'fp8_e5m2'] = 'auto'