Thanks to Stefan Hoeche, Steve Mrenna and Taylor Childers for their work as co-conveners of this WG in 2018-2019, and welcome to Efe Yazgan as a new co-convenor!
Please feel free to contribute to the live notes of the meeting on codimd!
This is a follow-up to the talk by Kyle Fielman at the WG meeting in July 2019, about the Argonne work on MadGraph on GPUs. See also Junichi Kanzaki’s talk at the HOW2019 workshop in March 2019 about previous work in this area in Japan.
Work has been done mainly on the phase space integration (VEGAS on GPU, aka gVEGAS), not on event generation (gSPRING). So far the matrix element calculation within gVEGAS has not been ported to GPUs.
Event generation is most likely more time consuming overall than phase space integration. A more detailed profiling and verification of that is planned.
There is one postdoc at Argonne working on an ATLAS qualification task on these topics.
Qiang: starting with any specific MadGraph version? Answer: did not get to integrating with MG yet, but plan to use the version used in CMS.
Taylor: there are frameworks like kokkos which are more portable than CUDA. StefanR: we also have a workshop upcoming in April on alpaka, including one day of public training on April 27 (https://indico.cern.ch/event/858758), plus three days of hands-on (https://indico.cern.ch/event/867700, but this is by invitation only and slots are already full). Walter: could be very interesting for our use case.
JoshMF: will you work also on the event generation (gSPRING)? From an analyser’s point of view, integration is the most painful part, but overall the largest time spent is event generation. Walter: apparently gSPRING already exists, but have not obtained Junichi’s code yet.
Andrea: is the ME calculation ported on GPU yet? Walter: it is meant to exist, but have not managed to run it yet. Olivier: plan to work on that.
Andrea: very important work, very good that many people are interested in it, we should take it offline and try to understand how to best collaborate.
JoshMF: is there some training for GPU development? Graeme: following up with Maria Girone in Openlab. Walter: not an expert and doing some learning, it’s totally not obvious and sometimes the algorithms need to be changed. JoshMF: should we maybe go another direction, and instead of porting doing a reengineering? Olivier: also just starting on CUDA, but definitely plan to spend some time on this.
Stefan: was difficult last year to get all relevant pieces of MadGraph on GPU together in one place, can this be done? Walter: already put a link there.
JoshB: for the user the integration is the painful process. So working on easier gridpack creation from users is also very important, not just event generation.
Olivier: note that GPU port is at present only conceivable for LO, while for NLO it would be much more complex, due to the need for 1-loop libraries. But e.g. DY+4 jets at LO (which is already a pain for CMS users) would be possible.
Matrix element calculation is the dominant computational cost. A DNN approach increases the integration or unweighting efficiency, hence it reduces the number of matrix element calculations needed.
JoshMF: how hard would it be to integrate this in MadGraph? JoshB: presently working on a future development branch of MG using python, it would be more complex on the present Fortran based implementation. This work is still at the R&D stage.
Taylor: how far away are you from using an actual process, rather than just using some test functions? JoshB: still having some numerical stability issues.
Andrea: does the DNN have an impact on the physical precision? JoshB: as long as the coverage of all the phase space is done correctly in the NN, this should not be an issue.
Andrea: is this portable to any generator like sherpa, not just MG? JoshB: yes, general idea applies to all generators, then the problem is just doing the integration with a specific generator.
Stefan: introduction to GSOC. Generally people are very happy with the quality of students.
AndyB: proposing to a GSOC some work on GPU implementation of ME might be a huge thing, it requires supervision from physics experts. Markus: for a summer project this might be too much. Stefan: already doing some profiling could be something. But definitely we would need help from authors. Andrea/Josh: also need some benchmarking suites to start.
Steve: should the mentors be at CERN? Graeme: no, neither mentors nor students are meant to be at CERN, they can be remote.
Graeme: often difficult to attract students, so we should start with why physics is exciting!
JoshMcF: The implementation of HDF5 as replacement for LHEF might be a nice thing to add to Stefan’s proposal? This was done for sherpa/pythia8 but it could be done for other generators. Holger: the code which exists is generator-independent.
Graeme: can submit many projects, but you need to have a mentor for each.
JoshMF: volunteers to be a mentor. Thanks!
The next HSF/WLCG Workshop will take place in Lund from 11 to 15 May (more information on indico). The agenda is still being prepared. We should expect to give a summary of the WG activities there. We can also organise a parallel session, keeping in mind that Lund is the ‘home’ of Pythia and many relevant experts on MC generators are based there.
Graeme: can have more than one slot of 90 minutes, could for instance have one slot of talks and one of discussions. JoshMF: yes this is a nice idea. Andrea: focus should be computational aspects.
Steve: will be there in April for the Pythia8 meeting, unfortunately not at the sam etime.
LHCC have requested a review of High-Luminosity computing for the LHC (see charge document attached to the agenda). The first phase is 18-20 May at CERN. Five documents will be requested. The HSF is in charge of preparing a ~20 page Common Tools and Community Software document, which has to be ready by 1 May, well before the Lund workshop. The document should cover software directly impacting on resources (Event Generation, Detector Simulation, Reconstruction, Data Analysis) and we agreed that each of the relevant HSF working groups takes charge of the corresponding section in the document. The current skeleton of the document is available on googledoc.
Draft HSF Timeline:
The HSF generator WG should thus prepare ~4 pages of text. Starting point should be the CWP and the goals laid out there. R&D plans should be outlined (for further scrutiny of the progress by the review panel after 18 months, in the fall 2021).
Graeme: perfectly ok to discuss important things that are not yet being done, and this is maybe very important for generators. Andrea: yes, for instance pointing out importance of computational work on generators, need to attract/keep the right people; also, forecasts of LO/NLO/NNLO needs from a physics perspective.
JoshMF: in the past the ATLAS accounting was done in seconds, not HS06 seconds. Have been working on HS06 numbers, will first present them today!
Andrea: so ballpark is that for MC production, 20% ifs generation, 60% is simulation, 20% is reco? JoshMF: yes, an dMC production is around 70% of all of ATLAS time. So generation is around 14% overall.
Stefan: is digitization part of reco? JoshMF: yes it is put together. Efe: if like CMS, then it is very small.
In previous results, we had an approximate GEN/SIM split. In the new campaign (since ~September 2019) we now have separate GEN and SIM information because CMS launches GEN and SIM jobs separately. Also, HS06 numbers are now available.
Note that some of the results we gave in the table previously is now lost because the monitoring is only kept for 18 months!
Slide 3: note 4th and 5th line are the slowest. The 4th is MG GEN-only, it is as slow as the 5th that is sherpa GEN+SIM and includes one more jet. So it is a bit surprising.
For ttbar only, in the past we estimated GEN to be 4% of full MC chain, now it is around ~7%. And this number is around 7% to 35% for the different channels.
Steve: are you taking into account the different matching efficiencies? JoshB: yes, in the end, the numbers we get are how many unweighted events we produce.
Andrea, so in summary we should compare for fraction of CPU time for GEN over all time in a MC campaign: 20% for ATLAS, against 7% to 35% for CMS. This is not so different. Then we should try to get an average for all processes. The idea is to get a prediction for a typical year, now and eventually at HL-LHC. Our goal should be to try and get precise numbers, so that we can prioritise where we need to invest to improve generation time, if relevant.
JoshB: aside, note that MINLO is a very slow generator because it recomputes pdfs, so if we add it to the table it would appear very slow.
A few points from Kyle:
A few open questions from Andrea:
Gurpreet: mainly working on sherpa in CMS, would be very interested from the CMS side about that.
Aim for one meeting per month on average, more frequently when needed (e.g. while preparing the Lund workshop and the HL-LHC review).
Possible topics for some of the next meetings: