Geant4-FastSim - Building an ML pipeline for fast shower simulation

Description

In Large Hadron Collider (LHC) experiments, at CERN in Geneva, the calorimeter is a key detector technology to measure the energy of particles. These particles interact electromagnetically and/or hadronically with the material of the calorimeter, creating cascades of secondary particles or showers. Describing the showering process relies on simulation methods that precisely describe all particle interactions with matter. A detailed and accurate simulation is based on the Geant4 toolkit. Constrained by the need for precision, the simulation is inherently slow and constitutes a bottleneck for physics analysis. Furthermore, with the upcoming high luminosity upgrade of the LHC with more complex events and a much increased trigger rate, the amount of required simulated events will increase. Machine Learning (ML) techniques such as generative modeling are used as fast simulation alternatives to learn to generate showers in a calorimeter, i.e. simulating the calorimeter response to certain particles. The pipeline of a fast simulation solution can be categorized into five components: data preprocessing, ML model design, validation, inference and optimization. The preprocessing module allows us to derive a suitable representation of showers, to perform data cleaning, scaling and encoding. The preprocessed data is then used by the generative model for training. In order to search for the best set of hyperparameters of the model, techniques such as Automatic Machine Learning (AutoML) are used. The validation component is based on comparing different ML metrics and physics quantities between the input and generated data. The aim of this project is to optimize the ML pipeline of the fast simulation approach using the open-source platform Kubeflow. Furthermore, a byproduct of this project is that the student will gain expertise in cutting-edge ML techniques, and learn to use them in the context of high granularity image generation and fast simulation. Moreover, this project can serve as a baseline for future ML pipelines for all experiments at CERN.

Project Milestones

Build Kubeflow pipeline from the existing project (full simulation, data preprocessing, training, validation)
Evaluate and study each pipeline component

Expected results

An efficient ML pipeline with documentation that must include:

Comparison and evaluation of the pipeline with all its components
Pipeline documentation
Plots and outputs of Kubeflow performance monitoring

Requirements

Solid knowledge of ML
Strong python skills

Evaluation tasks

Python and ML exercises.

Mentors

Dalila Salamani (CERN)
Anna Zaborowska (CERN)

Additional Information

Difficulty level (low / medium / high): medium
Duration: 350 hours
Mentor availability: July-October

Corresponding Project

Geant4

Participating Organizations

CERN