Manipulation of massive astronomical data using graphs

Description

Each observation night, telescopes all around the world issue alerts based on what they observe on the sky. These alerts are typically streamed to other places, where the streams are analysed and the relevance of each alert is asserted in order to take a decision on the next steps to perform. Such decisions include for example retrieving a set of previous observations and extract the scientific information, sometimes hidden on a longer time-scale than the alert itself (transient objects, new objects, …). Given the unprecedented precision of next generation of telescopes, the stream of alerts will be made of millions of alerts per night, reaching the TB per night, and decisions and actions must be taken extremely fast. In this context, the efficient manipulation and the visualisation of patterns in such a volume of data are real challenges for our community.

Our group is investigating a broker solution, called Fink, based on Apache Spark structured streaming capabilities. Each night alerts are streamed from telescopes, analysed by the Fink broker, and redistributed to subscribers. But the processed data needs also to be stored for visualisation and post-processing.

Task ideas

The student will focus on JanusGraph, a scalable graph database, and Apache Spark, a cluster computing framework. The internship will have four steps:

Expected results

Ultimately, the developments will be integrated in the Apache Spark-based Fink broker project developed by the group at IJCLab.

Desirable Skills

Mentors

  1. Fink
  2. Apache Spark
  3. JanusGraph

Corresponding Project

Participating Organizations