The purpose of this paper is to propose effective parallelization strategies for the Iterated Local Search (ILS) metaheuristic on Graphics Processing Units (GPU). We consider the decomposition of the 3-opt Local Search procedure on the GPU processing hardware and memory structure. Two resulting algorithms are evaluated and compared on both speedup and solution quality on a state-of-the-art Fermi GPU architecture. We report speedups of up to 6.02 with solution quality similar to the original sequential implementation on instances of the Travelling Salesman Problem ranging from 100 to 3038 cities.

Free GPU computing nodes at hgpu.org

Registered users can now run their OpenCL application at hgpu.org. We provide 1 minute of computer time per each run on two nodes with two AMD and one nVidia graphics processing units, correspondingly. There are no restrictions on the number of starts.