3
Memory Consistency Model: Definition Memory consistency model Order in which memory operations will appear to execute  What value can a read return? Affects ease-of-programming and performance

4
Implicit Memory Model Sequential consistency (SC) [Lamport] Result of an execution appears as if All operations executed in some sequential order Memory operations of each process in program order No caches, no write buffers MEMORY P1P3P2Pn

5
Implicit Memory Model Sequential consistency (SC) [Lamport] Result of an execution appears as if All operations executed in some sequential order Memory operations of each process in program order No caches, no write buffers MEMORY P1P3P2Pn Two aspects: Program order Atomicity

18
Notes -Sequential consistency is not really about memory operations from different processors (although we do need to make sure memory operations are atomic). -Sequential consistency is not really about dependent memory operations in a single processor’s instruction stream (these are respected even by processors that reorder instructions). -The problem of relaxing sequential consistency is really all about independent memory operations in a single processor’s instruction stream that have some high-level dependence (such as locks guarding data) that should be respected to obtain correct results.

19
Relaxing Program Orders -Weak ordering: -Divide memory operations into data operations and synchronization operations -Synchronization operations act like a fence: -All data operations before synch in program order must complete before synch is executed -All data operations after synch in program order must wait for synch to complete -Synchs are performed in program order -Implementation of fence: processor has counter that is incremented when data op is issued, and decremented when data op is completed -Example: PowerPC has SYNC instruction (caveat: semantics somewhat more complex than what we have described…)

20
Another model: Release consistency -Further relaxation of weak consistency -Synchronization accesses are divided into -Acquires: operations like lock -Release: operations like unlock -Semantics of acquire: -Acquire must complete before all following memory accesses -Semantics of release: -all memory operations before release are complete -but accesses after release in program order do not have to wait for release -operations which follow release and which need to wait must be protected by an acquire

21
Cache Coherence Protocols How to propagate write? Invalidate -- Remove old copies from other caches Update -- Update old copies in other caches to new values

29
Note: Aggressive Implementations of SC Can actually do optimizations with SC with some care Hardware has been fairly successful Limited success with compiler But not an issue here Many current architectures do not give SC Compiler optimizations on SC still limited

35
An Alternate Programmer-Centric View Many models give informal software rules for correct results BUT Rules are often ambiguous when generally applied What is a correct result? Why not Formalize one notion of correctness – the base model Relaxed model = Software rules that give appearance of base model Which base model? What rules? What if don’t obey rules?

37
What Software Rules? Rules must Pertain to program behavior on SC system Enable optimizations without violating SC Possible rules Prohibit certain access patterns Ask for certain information Use given constructs in prescribed ways ??? Examples coming up

38
What if a Program Violates Rules? What about programs that don’t obey the rules? Option 1: Provide a system-centric specification But this path has pitfalls Option 2: Avoid system-centric specification Only guarantee a read returns value written to its location

43
Data-Race-Free-0 (DRF0) Definition Data-Race-Free-0 Program All accesses distinguished as either synchronization or data All races distinguished as synchronization (in any SC execution) Data-Race-Free-0 Model Guarantees SC to data-race-free-0 programs (For others, reads return value of some write to the location)

44
Programming with Data-Race-Free-0 Information required: This operation never races (in any SC execution) 1.Write program assuming SC 2.For every memory operation specified in the program do: Never races? yes Distinguish as data Distinguish as synchronization no don’t know or don’t care

52
Interactions Between Language and Hardware If hardware uses fences, language should not encourage default of synchronization If hardware only distinguishes based on special instructions, language should not distinguish individual operations Languages other than Java do not provide explicit support, high-level programmers directly use hardware fences

59
Programmer-Centric Models: A Systematic Approach In general What software rules are useful? What further optimizations are possible? My thesis characterizes Useful rules Possible optimizations Relationship between the above

62
Rules for Correct Java Programs Option 1: No “data races” (all races from accesses to implement synchronized ) + Works well on all hardware - Prohibits common idioms Option 2: All variables in a data race are declared volatile + Any program can be correct by making all volatile - On Sun, PowerPC, Alpha, IA-64, fences required: After volatile read, monitorenter Before volatile write, monitorexit Between volatile write and volatile read Often fences for volatile unnecessary