HSF Software Forum on Potential Gains from Modern Hardware, 10 October, 2018

Meeting Agenda and Slides

Introduction

There is a free forum slot in 2 weeks time (24 October). Contact Graeme if there is interest in presenting something.
Otherwise we should start to consider presentations for next year.

Potential Gains for Software from Modern Hardware

AutoFDO
- FullCMS example for AutoFDO is Geant4 example with the full CMS geometry, but not an actual full simulation
- Static build is probably really needed for some of the optimisations
- Getting a large static G4 library incorporated into an ATLAS simulation build has not been easy
- AutoFDO project itself only seems to develop slowly
Small allocations/deallocations
- Initialisation and destruction is costly, especially for more complex objects
- Optimisations can be to use size knowledge (vector::reserve()), arenas, simpler objects
Trident
- Try to avoid problem of breaking hotspots into multiple smaller “warm” pieces
- Does HEPSpec model our codes well?
  - Not at the instruction level!
- Geant4 is measured not to be bound by memory access speeds
  - This is interaction with memory subsystem, not any new/free activity
- Both simulation and digi/reco show a lot of memory “port” access and much less arithmetic than expected
- Trident can be started/stopped, but not programmatically at the moment
  - Overheads are measured and usually ~1%
Haswell is much better in code page access, which is why the single library shows so much less gain on this architecture than Ivy Bridge
Jakob - scope of R&D and gradual improvements is quite different, R&D might only target one aspect, limiting the overall impact; where as the gradual improvements might be very generic to the whole code base
David L - HL-LHC simulation for CMS is projected to be only 5% of overall workload
- Reconstruction scales in a very non-linear way, but simulation not so
Stefan R - but for LHCb 90% is simulation!
ALICE clarification: x15 is speed up on a single CPU, then x10 for GPU on top
What is “exploiting modern CPU arch”, +100%?
- It’s re-coding for vectorisation
- Pere - but GeantV has put a lot of effort into doing this and does not even get x2 speed up.
- Other sciences can have different problems, more amenable to these improvements
  - Working often with software engineers at centres, but our code has been quite optimised already
- LHCb got a perfect speed-up in RICH reco (x8), but this is only one piece of the code base, so overall impact for the application is much less
Data structures are important
- For GeantV, need to go beyond 1-D binning
- However, data access patterns can vary through the processing chain
LHC detectors are running, so we need to keep things working without breaking as well, stopping and rewriting everything is not going to happen
Stewart MH has been looking at LTO in ATLAS - anyone else? Could be an interesting topic for a future meeting

← Software Forum - Potential Gains on Modern Hardware, 26 September 2018

Physics Event Generator Computing Workshop, 26-27 November 2018 →