PRMON - Develop Logging and Unit Test Infrastructure For PRMON

Description

The PRoccss MONitor (or prmon) is a utility programme used to monitor resource consumption of jobs running on Linux hosts. It is widely used in the Worldwide LHC Computing Grid (WLCG) to monitor the performance of the millions of jobs run by the ATLAS experiment in particular. The output from prmon can then be used to detect anomalies on the level of individual jobs or task groups.

Prmon was first developed to monitor jobs running on CPUs, but is being extended in its functionality to cover, e.g., GPU jobs.

Task ideas

Prmon produces little output, but as its functionality expands the possibility of errors is growing and the use cases against which developers want to test its behaviour become more complex. It is therefore desirable to move from relatively simple error messages printed to std::clog to a more sophisticated and configurable logging scheme in C++.

This would allow logging levels to be adjusted dynamically and specialised to particular logging modules (e.g., generally INFO but DEBUG for one problematic component).

Then the range of unit tests in prmon should be extended, in particular to cover unusual and unexpected cases that have sometimes been observed in the field (e.g., a metric source giving a suddenly unexpected value). This will help make prmon more robust against unexpected inputs.

Expected results

Requirements

Mentors

Corresponding Project

Participating Organizations