In shared-memory Chip Multiprocessor (CMP), shared data between different cores must be exchanged through the last-level-shared-cache and cache coherence must be maintained at the same time. As the number of cores increase, the cache coherence wall has become more and more serious. As for the multimedia applications full of streaming-like data, existing… (More)

—with the advent of chip multiprocessor (CMP) architecture, programmer must tune the program to the architecture in order to fully utilize the hardware resource. How to parallel program multimedia application in the CMP is a big obstacle. In this paper, we introduce the potential parallelism in the multimedia application and the multi-grain parallelism… (More)

Due to technological parameters and constraints entailed in many-core processor with shared memory systems, it demands new solutions to the cache coherence problem. Directory-based coherence protocols have recently seemed as a possible scalable alternative for CMP designs. Unfortunately, with the number of on-chip cores increasing, many directory design… (More)

In a partially reconfigurable system with online placement algorithm, we try to avoid mapping some redundant tasks by caching modules on the reconfigurable area. This paper proposes an elaborate strategy named virtual deletion and a low cost board- level hardware named recycle cache to accomplish the goal. In our strategy, the record of corresponding module… (More)

Aggressive prefetching may cause much inter-core interference and lead to large performance in shared memory CMP systems. The paper aims at improving system performance and making prefetching effective. We study prefetching-caused inter-core interference of CMP system and propose a Global Prefetcher Aggressiveness Control Scheme (GPACS) to reduce useless… (More)

Memory system of Chip Multi-Threaded processors (CMT) suffers greater than ever before because of memory latencies brought by overloaded memory accessing requests. Data prefetching using helper threads has been proved to be an effective approach to tolerate memory latencies by past researches. However, as the sum total of threads increase, simultaneously… (More)