Rucio - Namespace synchronisation for shared OpenData

Description

Rucio is an open-source software framework that provides scientific collaborations the functionality to organize, manage, transfer, monitor, and access their distributed data across heterogeneous infrastructures. Rucio was originally developed to meet the requirements of the high-energy physics experiment ATLAS, and now is continuously enhanced to support diverse scientific communities. Since 2016 Rucio orchestrated multiple Exabytes of data access and data transfers globally.

With this project we seek to enhance Rucio to support a new mechanism to share data across Rucio instances. Different communities and experiments that are using Rucio have the need to share data in a safe and efficient manner, however right now this would require non-atomic and client-side operations. Such operations inadvertently lead to inconsistencies across Rucio instances and must be avoided. In this project we propose a native server-side mechanism, embedded in the Rucio core, such that safe and efficient data sharing can be achieved.

Objectives

Set up multiple typical Rucio-instances for scientific experiments:

Rucio core developments:

By the end of GSoC’23 we expect that the student has developed the necessary changes in Rucio including unit tests and successfully demonstrated the above mentioned use cases.

Evaluation task

Interested students should contact Mario and Martin directly for the evaluation, which includes setting up the Rucio development environment using the Kubernetes & Docker-based tutorials. Then, a low-difficulty development task will have to be implemented, and the corresponding pull request will have to be submitted. The pull request must pass the CI/CD pipeline.

Requirements

Mentors

  1. Rucio GitHub
  2. Rucio Documentation
  3. Rucio system overview journal article (Springer)
  4. Rucio operational experience article (IEEE Computer Society)

Additional Information

Corresponding Project

Participating Organizations