mllm API

The mllm.hpp header is the main header file that includes all essential MLLM components. It provides core functionalities for model loading, context management, asynchronous execution, and utility functions.

#include "mllm.hpp"

Core Functions

void mllm::initializeContext()

Initialize the MLLM context, register backends, and set up memory management.

void mllm::shutdownContext()

Shutdown the MLLM context and clean up resources.

void mllm::setRandomSeed(uint64_t seed)

Set the random seed for reproducible results.

Parameters:

seed – Random seed value

void mllm::setMaximumNumThreads(uint32_t num_threads)

Set the maximum number of threads for parallel execution.

Parameters:

num_threads – Maximum number of threads

void mllm::setPrintPrecision(int precision)

Set the floating-point precision for printing tensors.

Parameters:

precision – Number of decimal places

void mllm::setPrintMaxElementsPerDim(int max_elements)

Set the maximum number of elements to print per dimension.

Parameters:

max_elements – Maximum elements per dimension

void mllm::memoryReport()

Print a memory usage report.

bool mllm::isOpenCLAvailable()

Check if OpenCL backend is available.

Returns:

True if OpenCL is available, false otherwise

bool mllm::isQnnAvailable()

Check if QNN backend is available.

Returns:

True if QNN is available, false otherwise

void mllm::cleanThisThread()

Clean up thread-local resources.

SessionTCB::ptr_t mllm::thisThread()

Get the current thread’s session context.

Returns:

Shared pointer to SessionTCB

Parameter File Functions

ParameterFile::ptr_t mllm::load(const std::string &file_name, ModelFileVersion version = ModelFileVersion::kV1, DeviceTypes map_2_device = kCPU)

Load a parameter file.

Parameters:
  • file_name – Path to the parameter file

  • version – Model file version (default: kV1)

  • map_2_device – Target device for loading (default: kCPU)

Returns:

Shared pointer to ParameterFile

void mllm::save(const std::string &file_name, const ParameterFile::ptr_t &parameter_file, ModelFileVersion version = ModelFileVersion::kV1, DeviceTypes map_2_device = kCPU)

Save parameters to a file.

Parameters:
  • file_name – Path to save the parameter file

  • parameter_file – ParameterFile to save

  • version – Model file version (default: kV1)

  • map_2_device – Target device for saving (default: kCPU)

Utility Functions

template<typename ...Args>
void mllm::print(const Args&... args)

Print arguments to stdout with automatic formatting.

Parameters:

args – Arguments to print

Testing Functions

mllm::test::AllCloseResult mllm::test::allClose(const Tensor &a, const Tensor &b, float rtol = 1e-5, float atol = 1e-5, bool equal_nan = false)

Check if two tensors are close within tolerance.

Parameters:
  • a – First tensor

  • b – Second tensor

  • rtol – Relative tolerance (default: 1e-5)

  • atol – Absolute tolerance (default: 1e-5)

  • equal_nan – Whether NaNs should be considered equal (default: false)

Returns:

AllCloseResult containing comparison results

class mllm::test::AllCloseResult

Result structure for allClose function.

bool mllm::test::AllCloseResult::is_close

True if tensors are close within tolerance

size_t mllm::test::AllCloseResult::total_elements

Total number of elements compared

size_t mllm::test::AllCloseResult::mismatched_elements

Number of elements that don’t match within tolerance

float mllm::test::AllCloseResult::max_absolute_diff

Maximum absolute difference

float mllm::test::AllCloseResult::max_relative_diff

Maximum relative difference

Async Execution Functions

template<typename __Module, typename ...__Args>
std::pair<TaskResult::sender_t, Task::ptr_t> mllm::async::fork(__Module &module, __Args&&... args)

Fork a task for asynchronous execution.

Parameters:
  • module – Module to execute

  • args – Arguments for module execution

Returns:

Pair of sender and task pointer

std::vector<Tensor> mllm::async::wait(std::pair<TaskResult::sender_t, Task::ptr_t> &sender)

Wait for a single asynchronous task to complete.

Parameters:

sender – Sender-task pair

Returns:

Output tensors

template<typename ...__Args>
std::array<std::vector<Tensor>, sizeof...(__Args)> mllm::async::wait(__Args&&... args)

Wait for multiple asynchronous tasks to complete.

Parameters:

args – Sender-task pairs

Returns:

Array of output tensors

Signal Handling

void mllm::__setup_signal_handler()

Set up signal handlers for graceful shutdown on interruption.

void mllm::__signal_handler(int signal)

Signal handler function.

Parameters:

signal – Signal number

template<typename Func>
int mllm::__mllm_exception_main(Func &&func)

Exception-safe main function wrapper.

Parameters:

func – User function to execute

Returns:

Exit code

const char *mllm::signal_description(int signal)

Get human-readable description of a signal.

Parameters:

signal – Signal number

Returns:

Description string

Macros

MLLM_MAIN(...)

Main function macro that sets up signal handlers, initializes context, and provides exception safety.

Performance Functions

void mllm::perf::warmup(const ParameterFile::ptr_t &params)

Warm up the model with given parameters.

Parameters:

params – Parameters for warmup