At a time where “energy crisis” is something that we hear daily, we can’t help but wonder whether our research software can be made more sustainable, and more efficient as a byproduct. In particular, this question arises for ML scientific software used in high-throughput scientific computing, where large datasets composed of many similar chunks are analysed with similar operations on each chunk of data. Moreover, CPU/GPU-efficient software algorithms are crucial for the real-time data selection (trigger) systems in LHC experiments, as the initial data analysis necessary to select interesting collision events is executed on a computing farm located at CERN that has finite CPU resources.
The questions we want to start answering in this work are:
The students in this project will use metrics from the Green Software Foundation and from other selected resources to estimate the energy efficiency of machine learning software from LHC experiments (namely, top tagging using ATLAS Open data) and from machine learning algorithms for data compression (there is another GSoC project developing this code, called Baler). This work will build on previous GSoC / Master’s thesis work, and will expand these results for GPU architectures. If time allows, the student will then have the chance to make small changes to the code to make it more efficient, and evaluate possible savings.