We present recent results of prototyping general-purpose particle tracking on GPUs and discuss our CUDA implementation of transfer maps for single-particle dynamics and collective effects. The objective of this work being incorporation of the GPU-accelerated tracking into ANL’s accelerator code ELEGANT [1], we used the code’s quadrupole and drift-with-LSC elements as test cases, achieving 80x and 36x speedups over CPU implementations, respectively. We discuss quadrupole kernel optimizations, as well as data-parallel and hardware-assisted approaches to avoiding thread contention at the charge deposition stage of algorithms for modeling collective effects.