Rucio - Billion-row scalable and flexible metadata

Description

Rucio is a data management system for modern large-scale scientific experiments. It allows experiments to deal with vast amounts of data in a scalable, modular, and flexible way. Up to now, the close relationship between science metadata and the transactional state of the data management system was limiting the use of a scalable metadata catalogue, which has resulted in selected science metadata usage only. Previous attempts with non-relational database were prone to desynchronisation and inconsistency with the production system. Recent releases of the PostgreSQL and Oracle databases although provide an enticing new feature to support arbitrary JSON-encoded cells, which could allow the integration of a generic metadata component into the transactional state of Rucio directly. If you feel comfortable bringing databases to their knees with billions of rows, testing out new features at the forefront of technology, and eventually make a production-ready component out of it? Then this is the Google Summer of Code project for you!

Expected results

Objective 1 - Evaluate the JSON datatype in PostgreSQL up to several billion entries.

Objective 2 - Evaluate the JSON datatype in Oracle up to several billion entries.

Objective 3 - Implement a metadata component that integrates with core transactional model.

Objective 4 - Implement the client interaction

Objective 5 - Report

Mentors

Corresponding Project

Participating Organizations