ABSTRACT:We have implemented Conjugate Gradient solver, Fermion force, Gauge force, and “fat link” kernels from the MIMD Lattice Computation (MILC) Quantum Chromodynamics (QCD) application used to simulate four-dimensional SU(3) lattice gauge theory to work on NVIDIA GPUs. These kernels are responsible for over 98% of the application’s execution time and achieve between 1.0 and 3.5 GFLOPS per CPU core on conventional CPU systems. Their GPU-based counterparts achieve between 83 and 108 GFLOPS for single precision and between 15 and 33 GFLOPS for double precision on NVIDIA GTX280 GPU on a sufficiently large (e.g., 24^3x32) lattice. We have extended the implementation to multiple GPUs by cutting the grid space in the time dimension across multiple GPUs. The software is currently deployed on NCSA’s Lincoln GPU cluster and is in use for computing electromagnetic effects on particle masses, i.e., combining the effects of both QCD and Quantum Electrodynamics (QED).