Integration of Apache Parquet reading capabilities in ROOT data analyses

Description

ROOT is a C++ software toolkit widely used for high-energy physics (HEP) analyses. It specializes in reading and writing large amounts of data efficiently, using a custom storage format known as ROOT files. TDataFrame is a new tool that allows ROOT users to quickly and efficiently setup data analyses through a high-level declarative syntax (i.e. functional chains). The aim of this project is to seamlessly interface TDataFrame with common storage formats other than ROOT files, such as Apache Parquet. Users of TDataFrame will be able to transparently read data stored in these formats with minimal or no change required to the analysis code.

Task ideas and expected results

Requirements

Strong C++ and C++11 skills, familiarity with the Apache Parquet file formats. Familiarity with the ROOT software library is highly desirable.

Mentors

Corresponding Project

Participating Organizations