This thesis investigates possible optimization on an efficient implementation of the multilevel
fast multipole algorithm (MLFMA), which is intended for solving integral equations for large
problems. Though MLFMA is not inherently parallel due to its tree-like computational
structure, if carefully optimized, it is suitable for parallelization as the throughput and
computation power becomes higher on current GPU accelerators. By dividing problems into
hierarchical multilevel groups, the MLFMA can be distributed to supercomputers like the
Blue Waters, utilizing massive computing resources and balancing the workload. For solving
large problems with stability and fast convergence rate, several different iterative solvers are
written using PETSc (Portable, Extensible Toolkit for Scientific Computation) math library
routines in the MLFMA and compared for performance. The use of GPU accelerators has
also been implemented in CUDA C++ and showed great improvement on Blue Waters.