The amount of data that is processed by individual scientists has grown hugely in the past decade. It is not unusual for a user to have data processed on tens of thousands of processors with these located at tens of different locations across the globe. The Ganga user interface was created to allow for the management of such large calculations. It helps the user to prepare the calculations, submitting the tasks to a resource broker, keeping track of which parts of the task that has been completed, and putting it all together in the end.
The scale of the computations submitted through the interface is placing increasing constraints on the system. Keeping track of where the tasks are executing, what the status of them are and subsequently dealing with the data that they create has become a bottleneck using the existing solution based on a simple ThreadPool in Python. We are looking for a new implementation of this overall monitoring aspect of Ganga.
Moving Ganga from the present situation where the monitoring often falls behind, leading to frustrated usesrs that can’t get hold of their results even if the tasks are finished, to a new monitoring system that is able to keep the user’s session responsive and that doesn’t fall behind in the monitoring of tasks.
As a student, you will gain experience with the challenges of concurrency and how different solutions will have to be used when implementing this in a large existing framework. You will of course also gain experience with working within an open source environment, presenting your work, writing test cases and utilise continuous integration as part of your development cycle.
Interested students please contact Ulrik (see contact below) to ask questions and for an evaluation task.
Python programming (advanced), use of concurrency in Python (intermediate), Linux command line experience (novice).