The amount of data that is processed by individual scientists has grown hugely in the past decade. It is not unusual for a user to have data processed on tens of thousands of processors with these located at tens of different locations across the globe. The Ganga user interface was created to allow for the management of such large calculations. It helps the user to prepare the calculations, submitting the tasks to a resource broker, keeping track of which parts of the task that has been completed, and putting it all together in the end.
The large scale processing of data is often only one part of a long processing chain for obtaining the results that eventually will end up in an academic publication though. Calibrations need to be performed, plots should be made, checks run for systematic effects etc. To do all these small tasks, the snakemake tool is increasingly used. The idea for ths project is to implement Ganga as a plugin for snakemake. There is already support for various batch systems in snakemake, so this will take it one step further.
For the scientific users of Ganga, they will be able to integrate their large scale data processing into their overall snakemake workflow.
As a student, you will gain experience with the challenges of large scale computing where some tasks of a large processing chain might take several days to process, have intermittent failures and have thousands of task processing in parallel. You will work as part of a small team that carries out the developments and support for Ganga.
Interested students please contact Ulrik (see contact below) to ask questions and for an evaluation task.
Python programming (advanced), Linux command line experience (intermediate), use of git for code development and continuous integration testing (intermediate)