Uproot is a pure Python reader and writer of the ROOT file format, the primary format for High Energy Physics data. It converts ROOT files on disk to and from NumPy arrays, Awkward Arrays, and Pandas DataFrames for analysis.
Uproot is primarily “eager,” in that a request for ROOT data to be read into memory is executed immediately, with the exception of one function, uproot.lazy. This function is currently only supported by Uproot’s Awkward Array backend, and it relies on the VirtualArray and PartitionedArray array types that are being deprecated in favor of a new dask-awkward collection type. Dask is an industry standard library for delayed and distributed computation in Python.
The goal of this project would be to reimplement uproot.lazy
(now uproot.dask
) using the new dask.awkward
collection. In addition, the new function should also support the NumPy and Pandas backends, leveraging the dask.array and dask.dataframe collection types in Dask.
It is part of a larger project to update Uproot 4 to use Awkward Array version 2, of which the migration to Dask is one part. uproot.dask
will be a key feature of Uproot version 5.
As written, these tasks are not sequential: implementation will likely be iterative, and your understanding should grow throughout the process.
uproot.lazy
, how Uproot reads data from local and remote files, schedules I/O and interpretation tasks internally, and how it uses Awkward version 1 to build a lazy array.dask.awkward
collection type, how to substitute the deprecated VirtualArray and PartitionedArray node types.uproot.dask
for the NumPy backend (first; should be the easiest of the three).The new uproot.dask
function should return a Dask DAG node (high level), suitable for further processing in dask.array
, dask.dataframe
, or the new dask.awkward
. All modes of processing: synchronous, asynchronous, multithreaded, multiprocessing, and distributed, should work without unnecessary copying of data. This should be demonstrated with diagnostics.
Interested students please contact Jim (pivarski@princeton.edu) for an evaluation task.