(Submitted on July 10, 2018; Revised on August 12, 2018; Accepted on September 11, 2018)

Abstract:

Modern processor micro-architecture offers advanced prefetch mechanisms that are designed to effectively hide memory latency and improve application performance. However, pointer-chasing applications employing linked data structures expose a memory latency problem that is difficult to deal with by using hardware prefetchers. It is promising that helper threaded prefetching based on Chip Multiprocessor is an effective method for reducing the memory latency of accesses to linked data structures. In this paper, we first illustrated two L2 prefetchers on Chip Multiprocessor and two different helper threaded prefetching techniques for pointer-chasing applications. Then, we revealed the limitations of L2 prefetchers for pointer-intensive applications after applying two different threaded prefetching techniques. Finally, we optimized the deployment of L2 prefetchers with two different threaded prefetching techniques for pointer-chasing applications. The experimental results indicate that L2 prefetchers’ effectiveness on helper threads depends on the memory access pattern of the targeted applications, and the optimized deployment of L2 prefetchers further improves the performance of pointer-intensive applications.