The PRoccss MONitor (or prmon
) is a utility programme used to monitor
resource consumption of jobs running on Linux hosts. It is widely used in the
Worldwide LHC Computing Grid (WLCG) to monitor the performance of the millions
of jobs run by the ATLAS experiment in particular. The output from prmon can
then be used to detect anomalies on the level of individual jobs or task
groups.
Prmon was first developed to monitor jobs running on CPUs, but is being extended in its functionality to cover, e.g., GPU jobs.
Prmon produces little output, but as its functionality expands the possibility of errors
is growing and the use cases against which developers want to test its behaviour
become more complex. It is therefore desirable to move from relatively simple
error messages printed to std::clog
to a more sophisticated and configurable logging
scheme in C++.
This would allow logging levels to be adjusted dynamically and specialised to particular
logging modules (e.g., generally INFO
but DEBUG
for one problematic component).
Then the range of unit tests in prmon should be extended, in particular to cover unusual and unexpected cases that have sometimes been observed in the field (e.g., a metric source giving a suddenly unexpected value). This will help make prmon more robust against unexpected inputs.
prmon
./proc
.