Many FLOP-intensive high-energy physics algorithms could profit from the vector pipelines of modern processors; they don’t because they don’t have vectorizable inner loops. The project idea is to implement and benchmark a generic vector flow service that can non-intrusively integrate with arbitrary data processing frameworks and can expose algorithms to the higher-level event loop of these frameworks.
The service will filter the main data stream to extract data of interest, accumulating data in vectors to transform the original scalar data flow into a vector one. After a data transformation step extracting the algorithm input data in form of structures of arrays, this can be directly fed into a vector-aware implementation of the given algorithm. The output data can be scattered in scalar form and re-integrated in the framework data flow. Depending on the intrinsic algorithm gain from SIMD vectorization and better data caching, the overheads introduced by the extra data transformations can be much smaller than the benefits.