Rucio is an open-source software framework that provides functionality to scientific collaborations to organize, manage, monitor, and access their distributed data and dataflows across heterogeneous infrastructures. Rucio was originally developed to meet the requirements of the high-energy physics experiment ATLAS, and is continuously enhanced to support diverse scientific communities. Since 2016 Rucio orchestrated multiple Exabytes of data access and data transfers globally.
With this project we seek to enhance Rucio clients to make them easier to use in heterogeneous environments, especially with the availability of different transfer tools. Within High-Energy Physics (HEP) we are well-covered with the GFAL libraries, however there are diverse communities outside HEP which are relying on more widely-deployed tools, each with their own quirks and peculiarities. It is therefore important to add support for them in Rucio, most importantly scp, rsync, and rclone. Rucio should then be made more customisable such that the optimal transfer tool is automatically used for a given environment.
The tasks are as follows:
By the end of GSoC’21 we expect to have a fully working Rucio client that can smartly use the available transfers tools, based on local and remote system configurations, and runs under a variety of different Linux distributions. As a stretch goal, performance optimisation for high throughput Exascale data management is of course very appreciated.