How to perf modules¶
MLLM provides built-in performance profiling capabilities based on Perfetto, which allows you to analyze the execution performance of your models and modules.
Prerequisites¶
To use the performance profiling feature, MLLM must be built with Perfetto support. This is typically enabled by default in builds that include performance analysis capabilities.
Basic Usage¶
To profile your MLLM application, you need to add a few calls to your code:
Start profiling at the beginning of your main function or where you want to start measuring
Stop profiling at the end of the section you want to measure
Save the profiling results to a file
Example¶
Here’s a simple example of how to use performance profiling in your MLLM application:
#include "mllm/mllm.hpp"
int main() {
mllm::initializeContext();
// Start performance profiling
#ifdef MLLM_PERFETTO_ENABLE
mllm::perf::start();
#endif
// Your model code here
// ...
// Stop performance profiling
#ifdef MLLM_PERFETTO_ENABLE
mllm::perf::stop();
mllm::perf::saveReport("perf_trace.perfetto");
#endif
mllm::shutdownContext();
return 0;
}
In a more complete example, similar to the Qwen2 VL example:
#include "mllm/mllm.hpp"
int main(int argc, char** argv) {
mllm::initializeContext();
#ifdef MLLM_PERFETTO_ENABLE
mllm::perf::start();
#endif
// Load and run your model
// ...
#ifdef MLLM_PERFETTO_ENABLE
mllm::perf::stop();
mllm::perf::saveReport("model_perf.perfetto");
#endif
mllm::shutdownContext();
return 0;
}
Analyzing Results¶
After running your application, you’ll get a .perfetto file that can be opened with the Perfetto UI at https://ui.perfetto.dev/. This interface allows you to:
View timeline of operations
Analyze execution time of different components
Identify performance bottlenecks
Examine memory usage patterns
Performance Categories¶
MLLM’s performance tracing is organized into several categories:
mllm.func_lifecycle: Function lifecycle eventsmllm.tensor_lifecycle: Tensor creation, allocation and destructionmllm.kernel: Computational kernel executionmllm.ar_step: Auto-regressive steps in language models
These categories help you filter and analyze specific aspects of your model’s performance.
Best Practices¶
Only enable profiling when needed, as it may impact performance
Use descriptive names for your trace files
Profile representative workloads to get meaningful results
Remember to call both
perf::start()andperf::stop()to ensure proper tracing