New protocols for exascale data management with Rucio

Description

Rucio is an open-source software framework that provides functionality to scientific collaborations to organize, manage, monitor, and access their distributed data and dataflows across heterogeneous infrastructures. Rucio was originally developed to meet the requirements of the high-energy physics experiment ATLAS, and is continuously enhanced to support diverse scientific communities. Since 2016 Rucio orchestrated multiple Exabytes of data access and data transfers globally.

With this project we seek to enhance Rucio clients to make them easier to use in heterogeneous environments, especially with the availability of different transfer tools. Within High-Energy Physics (HEP) we are well-covered with the GFAL libraries, however there are diverse communities outside HEP which are relying on more widely-deployed tools, each with their own quirks and peculiarities. It is therefore important to add support for them in Rucio, most importantly scp, rsync, and rclone. Rucio should then be made more customisable such that the optimal transfer tool is automatically used for a given environment.

Tasks

The tasks are as follows:

Requirements

Expected results

By the end of GSoC’21 we expect to have a fully working Rucio client that can smartly use the available transfers tools, based on local and remote system configurations, and runs under a variety of different Linux distributions. As a stretch goal, performance optimisation for high throughput Exascale data management is of course very appreciated.

Mentors

  1. Rucio GitHub
  2. Rucio Documentation
  3. Rucio system overview journal article (Springer)
  4. Rucio operational experience article (IEEE Computer Society)

Corresponding Project

Participating Organizations