Emerging task-based parallel programming models shield programmers from the daunting task of parallelism management by delegating the responsibility of mapping and
scheduling of individual tasks to the runtime system. The runtime system can also use dependency information between tasks supplied by programmers and the mapping information of tasks to enable optimizations like data-flow based execution and localityaware scheduling of tasks. However, should the cache coherence substrate have access to this information from the runtime system, it would enable aggressive optimizations of prevailing access patterns such as one-to-many producer-consumer sharing and migratory sharing. Such linkage has however not been studied
before.
We present, for the first time, a family of runtime guided
cache coherence optimizations enabled by linking dependency
and mapping information from the runtime system to the cache
coherence substrate. By making this information available to the cache coherence substrate, we show that optimizations, such as downgrading and self-invalidation, that help reducing overheads associated with producer-consumer and migratory sharing can be supported with reasonable extensions to the baseline cache coherence protocol. Our experimental results establish that each
optimization provides significant performance gain in isolation and can provide additional gains when combined. Finally, we evaluate our runtime-guided family of cache coherence optimizations in the context of earlier proposed runtime-guided prefetching schemes and show that they can synergistically offer more benefit than each technique in isolation.

BibTeX @techreport{Manivannan2013,author={Manivannan, Madhavan and Stenström, Per},title={Runtime-Guided Cache Coherence Optimizations in Multi-core Architectures},abstract={Emerging task-based parallel programming models shield programmers from the daunting task of parallelism management by delegating the responsibility of mapping and
scheduling of individual tasks to the runtime system. The runtime system can also use dependency information between tasks supplied by programmers and the mapping information of tasks to enable optimizations like data-flow based execution and localityaware scheduling of tasks. However, should the cache coherence substrate have access to this information from the runtime system, it would enable aggressive optimizations of prevailing access patterns such as one-to-many producer-consumer sharing and migratory sharing. Such linkage has however not been studied
before.
We present, for the first time, a family of runtime guided
cache coherence optimizations enabled by linking dependency
and mapping information from the runtime system to the cache
coherence substrate. By making this information available to the cache coherence substrate, we show that optimizations, such as downgrading and self-invalidation, that help reducing overheads associated with producer-consumer and migratory sharing can be supported with reasonable extensions to the baseline cache coherence protocol. Our experimental results establish that each
optimization provides significant performance gain in isolation and can provide additional gains when combined. Finally, we evaluate our runtime-guided family of cache coherence optimizations in the context of earlier proposed runtime-guided prefetching schemes and show that they can synergistically offer more benefit than each technique in isolation.},publisher={Chalmers University of Technology},place={Göteborg},year={2013},series={Technical report - Department of Computer Science and Engineering, Chalmers University of Technology and Göteborg University, no: 2013:08},keywords={cache coherence, sharing patterns, task parallelism, runtime systems, prefetching},note={10},}

RefWorks RT ReportSR PrintID 186432A1 Manivannan, MadhavanA1 Stenström, PerT1 Runtime-Guided Cache Coherence Optimizations in Multi-core ArchitecturesYR 2013AB Emerging task-based parallel programming models shield programmers from the daunting task of parallelism management by delegating the responsibility of mapping and
scheduling of individual tasks to the runtime system. The runtime system can also use dependency information between tasks supplied by programmers and the mapping information of tasks to enable optimizations like data-flow based execution and localityaware scheduling of tasks. However, should the cache coherence substrate have access to this information from the runtime system, it would enable aggressive optimizations of prevailing access patterns such as one-to-many producer-consumer sharing and migratory sharing. Such linkage has however not been studied
before.
We present, for the first time, a family of runtime guided
cache coherence optimizations enabled by linking dependency
and mapping information from the runtime system to the cache
coherence substrate. By making this information available to the cache coherence substrate, we show that optimizations, such as downgrading and self-invalidation, that help reducing overheads associated with producer-consumer and migratory sharing can be supported with reasonable extensions to the baseline cache coherence protocol. Our experimental results establish that each
optimization provides significant performance gain in isolation and can provide additional gains when combined. Finally, we evaluate our runtime-guided family of cache coherence optimizations in the context of earlier proposed runtime-guided prefetching schemes and show that they can synergistically offer more benefit than each technique in isolation.PB Chalmers University of TechnologyT3 Technical report - Department of Computer Science and Engineering, Chalmers University of Technology and Göteborg University, no: 2013:08LA engOL 30