Rucio - Exascale data management

Description of project idea

Rucio is a software framework that provides functionality to organize, manage, and access large volumes of scientific data using customisable policies. The data can be spread across globally distributed locations and across heterogeneous data centers, uniting different storage and network technologies as a single federated entity. Rucio offers advanced features such as distributed data recovery or adaptive replication, and is highly scalable, modular, and extensible. Rucio has been originally developed to meet the requirements of the high-energy physics experiment ATLAS, and is continuously extended to support LHC experiments and other diverse scientific communities. For example, the ATLAS experiment orchestrated an Exabyte of data transfer and processing and is growing rapidly.

The current documentation is available from multiple places and in different formats, including scientific articles, readthedocs.io with source in the code, Google Drive, Github, DockerHub, or Wikis. This dispersion and diversity makes it difficult to pinpoint information and to recognise which information is outdated or superseded, wrong, or simply lacking in detail. Especially in Wikis the information is usually directed towards a single experiment instance of Rucio, however the underlying concepts can be applicable to multiple different instances.

With this proposal we aim to achieve the following tasks:

Project duration

We are open to both 3 month and 6 month projects, depending on what you think is required to achieve these tasks.

Expected results

One central documentation page servicing the different types of users. The different existing documentation sources should all be consolidated in this documentation page and restructured. New documentation, especially to support the information flow, should be written.

Experience required

General knowledge of Python, Docker, GIT is required.

Mentors

Corresponding Project

Participating Organizations