Monitoring and traceability of jobs in a large distributed computing Grid

Description

LHCb, the LHC “beauty” experiment, uses state of the art distributed computing technologies, integrating different kinds of computing and storage resources, including Grid and Cloud technologies. The physics data are processed and distributed using software solutions developed mostly by members of the LHCb collaboration. The DIRAC interware is one of them. DIRAC is a complex, open source, very actively developed software, whose roles range from the submission of jobs, the management of the data produced, to the orchestration of the distributed resources, while providing active monitoring and key information for the whole LHCb collaboration. DIRAC is a generic software, used and extended by several Virtual Organizations (VO). LHCb is DIRAC’s initiator and main contributor.

The student will contribute in developing DIRAC. Communities use DIRAC to submit jobs to hundreds of heterogeneous computing resources, with several tens of thousands of jobs running concurrently. Monitoring is essential. Traceability of each of the submitted jobs is key when security checks are needed. The student will extend the current job monitoring system, currently based upon relational databases, by using non-relational state-of-the-art solutions.

Task ideas

Expected results

Requirements

Mentors

Corresponding Project

Participating Organizations