Abstract: Faced with physical and energy density limitations on clock speed,
contemporary microprocessor designers have increasingly turned to on-chip
parallelism for performance gains. Examples include the Intel Xeon Phi, GPGPUs,
and similar technologies. Algorithms should accordingly be designed with ample
amounts of fine-grained parallelism if they are to realize the full performance
of the hardware. This requirement can be challenging for algorithms that are
naturally expressed as a sequence of small-matrix operations, such as the
Kalman filter methods widely in use in high-energy physics experiments. In the
High-Luminosity Large Hadron Collider (HL-LHC), for example, one of the
dominant computational problems is expected to be finding and fitting
charged-particle tracks during event reconstruction; today, the most common
track-finding methods are those based on the Kalman filter. Experience at the
LHC, both in the trigger and offline, has shown that these methods are robust
and provide high physics performance. Previously we reported the significant
parallel speedups that resulted from our efforts to adapt Kalman-filter-based
tracking to many-core architectures such as Intel Xeon Phi. Here we report on
how effectively those techniques can be applied to more realistic detector
configurations and event complexity.

Comments:

Submitted to the Proceedings of the 18th International Workshop on Advanced Computing and Analysis Techniques in Physics Research; 6 pages, 5 figures. arXiv admin note: text overlap with arXiv:1702.06359