Generalise the ROOT Parallel Declarative Analysis Framework to non-HEP Big Data

Description

The ROOT Software Framework is the cornerstone of all software stacks used by High Energy Physics (HEP) experiments, at CERN and other prestigious laboratories. It provides components which are fundamental for the entire data processing chain, from particle collisions to final publications, including final user data analysis, including modern machine learning techniques.

ROOT features a declarative analysis sub-system, TDataFrame, which has proven to be a solution to scale in-process parallel HEP data analysis to ~100 cores with a simple and intuitive programming model.

This project aims to build up on this result increasing the impact of TDataFrame in other disciplines dealing with Big Data such as Astrophysics or Genomics and even non scientific activities - and this exploiting both the Python and C++ interfaces of TDataFrame.

Task ideas

Expected results

Working implementation of one or more TDataSources for the aforementioned formats

Requirements

C++, Parquet/HDF5/Avro would be a plus, knowledge about ROOT and TDataFrame can be acquired during the project

Mentors

Corresponding Project

Participating Organizations