A novel high-performance computational framework for the discrete dipole approximation

Abstract

This work presents a highly optimized computational framework for the Discrete Dipole Approximation, a numerical method for calculating the optical properties associated with a target of arbitrary geometry that is widely used in atmospheric, astrophysical and industrial simulations. Core optimizations include the bit-fielding of integer data and iterative methods that complement a new Discrete Fourier Transform (DFT) kernel, which efficiently calculates the matrix' vector products required by these iterative solution schemes. The new kernel performs the requisite 3-D DFTs as ensembles of 1-D transforms, and by doing so, is able to reduce the number of constituent 1-D transforms by 60% and the memory by over 80%. The optimizations also facilitate the use of parallel techniques to further enhance the performance. Complete OpenMP-based shared-memory and MPI-based distributed-memory implementations have been created to take full advantage of the various architectures. Several benchmarks of the new framework indicate extremely favorable performance and scalability.

title = "Opendda: A novel high-performance computational framework for the discrete dipole approximation",

abstract = "This work presents a highly optimized computational framework for the Discrete Dipole Approximation, a numerical method for calculating the optical properties associated with a target of arbitrary geometry that is widely used in atmospheric, astrophysical and industrial simulations. Core optimizations include the bit-fielding of integer data and iterative methods that complement a new Discrete Fourier Transform (DFT) kernel, which efficiently calculates the matrix' vector products required by these iterative solution schemes. The new kernel performs the requisite 3-D DFTs as ensembles of 1-D transforms, and by doing so, is able to reduce the number of constituent 1-D transforms by 60{\%} and the memory by over 80{\%}. The optimizations also facilitate the use of parallel techniques to further enhance the performance. Complete OpenMP-based shared-memory and MPI-based distributed-memory implementations have been created to take full advantage of the various architectures. Several benchmarks of the new framework indicate extremely favorable performance and scalability.",

N2 - This work presents a highly optimized computational framework for the Discrete Dipole Approximation, a numerical method for calculating the optical properties associated with a target of arbitrary geometry that is widely used in atmospheric, astrophysical and industrial simulations. Core optimizations include the bit-fielding of integer data and iterative methods that complement a new Discrete Fourier Transform (DFT) kernel, which efficiently calculates the matrix' vector products required by these iterative solution schemes. The new kernel performs the requisite 3-D DFTs as ensembles of 1-D transforms, and by doing so, is able to reduce the number of constituent 1-D transforms by 60% and the memory by over 80%. The optimizations also facilitate the use of parallel techniques to further enhance the performance. Complete OpenMP-based shared-memory and MPI-based distributed-memory implementations have been created to take full advantage of the various architectures. Several benchmarks of the new framework indicate extremely favorable performance and scalability.

AB - This work presents a highly optimized computational framework for the Discrete Dipole Approximation, a numerical method for calculating the optical properties associated with a target of arbitrary geometry that is widely used in atmospheric, astrophysical and industrial simulations. Core optimizations include the bit-fielding of integer data and iterative methods that complement a new Discrete Fourier Transform (DFT) kernel, which efficiently calculates the matrix' vector products required by these iterative solution schemes. The new kernel performs the requisite 3-D DFTs as ensembles of 1-D transforms, and by doing so, is able to reduce the number of constituent 1-D transforms by 60% and the memory by over 80%. The optimizations also facilitate the use of parallel techniques to further enhance the performance. Complete OpenMP-based shared-memory and MPI-based distributed-memory implementations have been created to take full advantage of the various architectures. Several benchmarks of the new framework indicate extremely favorable performance and scalability.