Third HSF Workshop Summary (May 2-4, 2016)

Workshop Group Photo

Contents

Timetable

Project support in HSF (Mon am)

HSF is about SW: project support an important goal

A list of best practices have been drafted

Recommendations are not HEP specific which is as it should be

A template is available for new projects: hsf_create_project.py

Test resources: plenty of resources available (for free) for continuous integration

Define a set of standards to be met to be accepted as an HSF packages?

Oriented towards standalone projects. Another aspect is a project that needs to build against a number of other large pieces, and adds incremental functionality. Define standard ways of interfacing to existing projects.

Should include requirement for documentation of needed env variables, other environment setup

Please give feedback on the draft best practices guideline document linked in Benedikt’s slides. Would be a very good output of the workshop to have a document representing the collective view of this group.

HSF status (Mon pm)

HSF objectives remembered: share expertize, raise awareness around projects, make easier for people to start new projects…

HSF concrete work organized in 6 WGs: information exchange, training, SW packaging, SW licensing, SW projects, dev tools and services

Communication: web, mailing list/fora, knowledge base, technical notes

Training: focus on sharing training material

SW Packaging: see this morning and tomorrow’s hackathon

Licensing: guidelines summarized in a TN

SW projects: see this morning too

HSF is also about fostering collaboration around SW: some successes in the last year (Next-generation Conditions DB, Track reconstruction, Gaudi

Food for thoughts during the workshop: to be discussed on Wednesday

Discussion

Community white paper - Peter Elmer

HSF demonstrated some initial collaborative activities but to address the challenges ahead of us (e.g. HL-LHC) we need more and dedicated resources

News from projects (Mon pm)

Future conditions DB - Andrea Formica

Conditions data are used at different stages of our workflows and are also refined during these stages (online processing, prompt reconstruction, bulk processing, …)

CMS developed during LS1 a new schema trying to simplify what was done in Run 1: this schema is in production in Run 2

Architecture requirement: decouple clients from backend using a REST API, support many different kinds of backends including several relational platforms (Oracle, PostgreSQL,…), but eventually also NoSQL or file system. All business code to manage the conditions data is inside a server. The clients can be implemented in several languages because the communication is done via HTTP using JSON for the messages exchanged with the server.

Current prototype implemented in Java (based on JEE, Spring): easier integration into Frontier, the idea being to profit at most of the experience gained by the Frontier development group in terms of caching etc….

Client implementation for the moment on going. For python easy via automatic generation of client API based on swagger documentation.

AIDA2020 WP3 - F. Gaede

AIDA2020: detector R&D, not specifically SW

DD4HEP: 1 of the HSF incubator projects providing a generic detector description toolkit

USolids/VecGeom: generic shape library, introduced in G4 10.x

Alignment: developed a fully automated fast alignment procedure for LHCb VELO, in production

EDM Toolkit: PODIO project, also in HSF incubator, efficient I/O with PODs, currently being evaluated in the context of FCC/lcio

Framework extensions: improving parallel scheduling in frameworks, currently focused on Gaudi, later plan to use it in Marlin and PandoraSDK

DDG4: interface between DD4HEP and G4

Advanced tracking tools: follow-up of aidaTT from AIDA project

Advance Particle Flow Algorithms (Pandora PFA toolkit)

DIANA-HEP - P. Elmer

Data intensive ANAlisys for HEP: collaborative efforts around analysis tools to make them a sustainable infrastructure in our community

SW&C Knowledge Base - T. Wenaus

http://hepsoftware.org: the last generation (hopefully the last one, works nicely!)

See slides: many examples in them

WikiToLearn - R. Iaconelli

Collaborative textbooks: promote collaboration around training materials

Text at the heart of the system (external links or pdf are accepted but not encouraged)

Every user can make a book from a piece of training material

Support for draft pages (basically personal version of the material)

Tracking and notifications of modifications

Result of collaboration: sharing of effort

Ability to run snippets of code in the browser or in a container (AKHET, desktop as a container with Webdav file access)

Features to come

Learning from other communities (Mon pm)

Bioconductor - W. Huber

A use case illustrating the challenge in biomed: leukemia, a disease with an heterogeneity of causes posing a real challenge for treatment, drug research…

Bioconductor: collaborative and open-source SW

Contributor community increased over time

Several modes of interaction developed, including web site, mailing list, video-conference…

Importance of documentation: not only manual pages but user “vignettes” (narrative overview), workflows, citable papers with peer review

Increased use of GitHub for package development to allow early and open peer-review

Lessons learnt

Netherlands eScience Center

An organisation bridging scientific communities and computing infrastructures

One example: analysis with ROOT for Xenon1T (dark matter).

Depsy - J. Priem

Depsy: NSF funded project

SW citation often only informal and doesn’t permit identification and credit of contributors

Depsy takes into account indirect contributions (transitive credits): contribution to a project heavily reused

Aggregated impact of people across many projects

Discussion

Machine learning (Tue am)

Impact of Deep Learning for HEP SW&C - A. Farbin

Motivations, potential: see slides

GPUs critical for performance

Need to provision a large amount of resources: no longer embarassingly //

DL in reconstruction: will require distribution of large datasets for training DNN

Proposal of a R&D project consisting to build a HEP framework on top of DL

Potential of DL to find “unexpected things” still to be assessed in our context

OpenLab ML and Data Analytics Workshop - M. Girone

@CERN, April 29, https://indico.cern.ch/event/514434/timetable/#20160429.detailed

A lot of investment in industry: interesting contributions by several companies. Many good reasons to collaborate but must also take into account the different culture regarding collaboration. OpenLab and its NDAs infrastructure can help.

Event classification: interest by LHCb and ALICE to meet the Run3 challenges, CMS also has plans

Object identification: great potential, raised by all experiments. Could benefit from industry experience with image recognition (self-driving cars…) if we could formulate the problem in a similar way…

Anomaly detection: a growing use case in the industry (e.g. security), may be relevant to HEP

Data analysis: all experiments with plan to have events almost ready for analysis when the leave the detector (online reconstruction for ALICE and LHCb). Potential for streamlined analysis, want to look at tools like Spark/Haddoop, OpenLab to help setting up a testbed

A lot of tools available to optimize data access and analysis, produced by industry but open-source

Event classification and triggering probably the most challenging use case: nothing really matching this in the industry but several frameworks can help

Recent Developments in ROOT/TMVA - L. Moneta

TMVA future discussed last September. Core requirements identified.

Several actions decided, most of them already done or in progress: see slides

New method to compute Feature Importance based on contribution to the classifier

New DL classes recently added supporting recent developments in the field: currently being optimized by TMVA developers

Cross validation also recently added

Work in progress in TMVA

5 students will work this summer funded by Google (Google Summer of Code program)

Welcoming more contributions to TMVA development

V. Innocente: CMS has faced performance/mem footprint issues with TMVA DL classifiers in the context of reconstruction. We need to find solutions, e.g. reduced precision (FP16) which proves to be enough in many case.

Amir Farben: like TVMA interface but skeptical about trying to integrate anything into TMVA. Results is that it makes everything more complex, need to reinvent parallelisation optimisations…

Data Analysis and Reproducibility Tools for HEP - A. Ustyuzhanin

YANDEX: 2 companies related to data science (YANDEX Data Factory) and research education (YANDEX School of Data Analytics, non profit)

Several developments targeting reproducible research (all on GitHub, open-source, Apache 2.0)

Keras (Theano, TensorFlow) - M. Paganini

Keras: Python library interfacing with tensor manipulation frameworks like TensorFlow (TF) and Theano

In ATLAS:

OpenData in CMS - K. Lassila-Perini

Challenge: knowledge preservation

OpenData/preservation effort forces to address the “context metadata” preservation challenge. Beneficial for HEP community.

Lesson learned: much better to start the open data effort at the same time as data analysis is done but competing for human and computing resources…

A Common Tracking Software - A. Salzburger

Code optimizations for Run1/2 can allow to meet the CPU challenge (ATLAS and CMS already achieved a 5x since beginning of Run1) in trigger/tracker: not the case for HL-LHC

ML currently used in tracking mainly for classification: just opening to pattern recognition with ML Tacking challenge

LHC detector SW has really been stress-tested: idea of starting an experiment-agnostic toolkit based on the experience gained (ACTS)

Tracking ML challenge: can be an important step but need to agree on what we expect from it. CMS and ATLAS have different expectations from the LVL1 trigger, need to build kind of an hybrid detector description.

Inter-experiment ML Group and HSF - S. Gleyzer

IML founded mi-2015: community effort to modernize ML tools used in HEP

Recent new activity: connection between MEM (Matrix Element Methods) and ML

IML working closely with CERN software group to ensure that ML packages provide good perfs and are long-term supported

Tutorials: critical to attract new users

IML also wants to promote/increase collaboration with ML experts

IML evolved so much since its inception that an update of SW&C KB is needed! Will do it…

RAMP: ML Hackathon on Anomaly Detection (Tue pm)

RAMP: the ML challenge idea with a group of people in the same room during 1 day

Packaging (Tue pm)

Spack Presentation by Patrick Gartung. > 30 participants

Dependencies defined as a DAG: checked at installation time

Spack add-on repo created for HEP: on HSF GitHub

Geant4 technical form (Tue pm)

About 10 people in the room, more on video. Reports on G4 10.2 status, performance, em and hadronic physics developments, review of outstanding requirements being worked, detailed physics developments.

Software performance (Wed am)

SW Performance @ALICE - D. Rohr

Current situation: fast online reconstruction + offline reconstruction

Managed to get tracker CPU time increasing linearly with the number of tracks: confident it will work for Run3

Run 3 challenge: currently <2 Khz readout rate, moving to continuous readout at 50 Khz interaction rate

Future directions

ATLAS Observations - G. Stewart

ATLAS Performance Monitoring Board (PMB) in charge of monitoring information from job instrumentation and to report/understand in significant change in performance, memory footprint…

LS1 lesson: too many things to do (implement/test) in //

Linear algebra: CLHEP too slow, replaced by Eigen after a detailed evaluation

Magnetic field access was a big CPU consumer: big impact on simulation, a lot of improvements in this area

xAOD EDM: an important step to make analysis more efficient

Emphasis on code quality: critical to understand the code to improve it!

Run3: framework evolution, no radical change

Offline code reviews can be a good occasion to make progress, identify problems… and ensure that developers document their design and implementation!

CMS - D. Lange

As others, reconstruction is the primary performance target: target, particle flow based object identification, high granularity calorimeter

A lot of recent work in several different area: igprof has been an important tool to identify hotspots

Astrophysics Experience - O. Iffrig

Challenge of a fluid dynamics simulation: large number of points O(1000) in the 3 directions, 10 double precision variables

Parallelization was a requirement: based on MPI, each process has its own data, data exchanges at their border

Future directions explored:

Analysis is another challenge: many algorithms I/O bound

Challenge of evolving a large code-base (O(100K) lines)… still work in progress

ROOT Experience and Challenges - P. Mato

20 year old: reengineering/rewriting required in several areas, collaboration with the community required

Parallelisation: multithreading and multi-processing

Multi-node, on-demand analysis: SWAN

I/O perf also deserves work: exploring several approaches, including new serialization formats

Plan to exploit JIT capability of LLVM/CLANG for perf optimization at run time

Exploring functional chains à la Spark: user specifies what, ROOT decides how, providing room for optimizations

Need to explore multiple approaches to meet the challenges: not a single solution for all the use cases

Art/LArSoft - M. Paterno

Recent experience in MC G4 following an identified problem with memory footprint: problem turned out to be more complex than initially thought, a team of experts via different profiles assembled

Lesson learned

Would be better better to catch design problems earlier: main way to achieve it is collaboration during development, like for the development of an analysis

GeantV - J. Apostolakis

Simulation represents 50% of LHC computing: GV wants to improve perf by a 2.5 to 5x

Every component/class has a test and a benchmark both for scalar and vector interfaces

Importance of I/O

Basketization is another critical part

Several developments could be backported to G4.

Discussion panel

Graeme: hard limit to perf improvement set by HW. Changes in HW have a much higher latency than SW

Olivier: relying on HPC, the HW question is different. Resources provided by national facilities. No control of it, GPUs will be part of next generation of machines.

D. Rohr (D.R.): GPUs are here, no discussion about using it. The question is the right balance between GPU and CPU: may change over time, also a chance to do it incrementally.

Liz: emphasize Oliver’s point of view. Same in the US: HEP encourage to join the HPC community and the HPC community has a roadmap that includes GPUs as a significant part of the next machine generation. Need to deal with HW heterogeneity increases the pressure on the build/packaging system.

Amir: need to work together as a community around the R&D about these issues and come with a common framework that could be used as the basis for the computing infrastructure in 10 years. HSF could be the good place for this effort.

Pere: first step with GPU is to demonstrate the gain and for this, we need to identify the areas to concentrate on.

D.R.: easier to rewrite/adapt a specialized application (like tracking) than frameworks like ROOT and GEANT

Vincenzo: is there still a place for commodity HW? Pressed to join the HPC community: at the same time, less opportunity for commodity with automatic power-off capabilities in new HW… ARM vs. HPC: may achieve the same throughput without the big pressure on parallelization.

Graeme: no doubt we’ll have to rewrite a significant part of our code but it needs to match a large part of the architecture phase space. Need to rely on compiler and build systems to help with this.

J. Apostolakis (J.A.): impossible to say what will be the dominating infrastructure 5 years from now, need to remain flexible and be able to support multiple infrastructures at a low cost.

Olivier: current approach is to put the implementation details for each architecture in libraries and hide them from the users. No clear standard that could simplify the problem: still need to adapt to each architecture.

D.R.: ALICE took the opportunity of the requirement to rewrite the tracker to redesign it with GPU in mind. But originally not written for GPU.

Amir: expect to see different GPU HW for gaming/deep learning (16-bit precision for higher perf) and HPC (double precision). AMD being out of the HW business, it is in a better position to change the SW landscape around these new HW (well aware of the need to support many HW/programming environments).

Pere: need to prevent users specifying low-level things. If the user gives a high level description, it is easier for the system to do internal optimization.

Amir: need to look at what big players did to offload things like compression to GPUs. Highly connected with I/O and efficient I/Os from GPUs.

Pere: we need to assess the exact impact on the overall workflow of offloading one particular part to specialized HW. And compare it with the effort to support them.

Amir: HSF could have the role of identifying the needed R&D and convince funding agencies to support it. We need topical workshops or topical sessions in regular meetings. Need to be proactive.

M. Sokoloff: if current NSF proposal moves forward, we’ll need to produce the Community White Paper in the next 15 months: The HSF is the natural organisation to do it.

Jeff: we need a metric for HSF work: maximize efficiency of overall people contributions.

Liz: ultimate goal remains to get additional people, not to play a zero-sum game.

Dario: importance of trainings to meet the challenge by increasing existing people expertise.

Next steps & wrap-up (Wed pm)

##

HSF logo

Meeting notes

GeantV review

Journal proposal “Computing and Software for Data-intensive Physics”

Proposal to have a journal refereed, abstracted, indexed about HEP computing

Several open questions

Discussion

HSF in StackExchange?

Open questions

Vincenzo: counter-example of Geant4, with a different form of organization, namely the G4 collaboration

Samir: if we want to increase collaboration with others, not clear that they all want to give up their IPR.

Andy: supportive of HSF being able of taking ownership of IPR but skeptical that existing big projects with huge IPR are good gunieapigs.

Markus: any Foundation would be established under a national law; would US institutes/agencies agree to give IP to a foundation under EU/CH law, or vice-versa?

HSF governance

HSF Center

HSF (Human) resources

To be defined by CWP & roadmap

HSF communication

##

Community white paper & road map

Discussion

Outcomes, conclusions, next steps

Actions out of the workshop