Transactional Memory

Published on Nov 17, 2015

Abstract

A primary challenge of parallel programming is to find better abstractions for expressing parallel computation and for writing parallel programs. Parallel programming encompasses all of the difficulties of sequential programming, but also introduces the hard problem of coordinating interactions among concurrently executing tasks.

Today, most parallel programs employ low-level programming constructs that are just a thin veneer over the underlying hardware. These constructs consist of threads, which are an abstract processor, and explicit synchronization (for example, locks, semaphores, and monitors) to coordinate thread execution. Parallel programs written with these constructs are difficult to design, program, debug and maintain.

Transactional Memory was created to simplify parallel programming and relieve software developers from the difficulties associated with lockbased parallel programming. With TM, programmers simply mark code segments as transactions that should execute atomically and in isolation with respect to other code, and the TM system manages the concurrency control for them. All TM systems use either hardware-based or software-based approaches to implement the two basic TM mechanisms: data versioning and conflict detection.

What is Transaction Memory?

A transaction is a form of program execution borrowed from the database community. Concurrent queries conflict when they read and write an item in a database, and a conflict can produce an erroneous result that could not arise from a sequential execution of the queries. Transactions ensure that all queries produce the same result as if 5 they executed serially (a property known as “serializability”). Decomposing the semantics of a transaction yields four requirements, usually called the “ACID” properties—atomicity, consistency, isolation, and durability. TM provides lightweight transactions for threads running in a shared address space. TM ensures the atomicity and isolation of concurrently executing tasks.

In general, TM does not provide consistency or durability. Atomicity ensures program state changes effected by code executing in a transaction are indivisible from the perspective of other, concurrently executing transactions. In other words, although the code in a transaction may modify individual variables through a series of assignments, another computation can only observe the program state immediately before or immediately after the transaction executes.

Isolation ensures that concurrently executing tasks cannot affect the result of a transaction, so a transaction produces the same answer as when no other task was executing. Transactions provide a basis to construct parallel abstractions, which are building blocks that can be combined without knowledge of their internal details, much as procedures and objects provide composable abstractions for sequential code

DYNAMIC STM

Herlihy et al.’s Dynamic STM (DSTM) was the first STM system that did not require a program to declare the memory locations accessed by a transaction. DSTM is an object-granularity, deferred-update STM system, which means that a transaction modifies a private copy of an object and only makes its changes visible to other transactions when it commits. The transaction exclusively accesses the copy without synchronization. However, another transaction can access the original, underlying object while the first transaction is still running, which causes a logical conflict that the STM system detects and resolves by aborting one of the two transactions.

An STM system can detect a conflict when a transaction first accesses an object (early detection) or when the transaction attempts to commit (late detection). Both approaches yield the same results, but may perform differently and, unfortunately, neither is consistently superior. Early detection prevents a transaction from performing unnecessary computation that a subsequent abort will discard.

Late detection can avoid unnecessary aborts, as when the conflicting transaction itself aborts because of a conflict with a third transaction. Another complication is a conflict between a transaction that only reads an object and another that modifies the object. Since reads are more common than writes, STM systems only clone objects that are modified. To reduce overhead, a transaction tracks the objects it reads and, before it commits, ensures that no other transaction modified them.

DSTM is a library. An object manipulated in a transaction is first registered with the DSTM system, which returns a TMObject wrapper for the object (as illustrated in the accompanying figure). Subsequently, the code executing the transaction can open the TMObject for read-only or read write access, which returns a pointer to the original or cloned object, respectively. Either way, the transaction manipulates the object directly, without further synchronization. A transaction ends when the program attempts to commit the transaction’s changes. If the transaction successfully commits, the DSTM system atomically replaces, for all modified objects, the old object in a Locator structure with its modified version.

A transaction T can commit successfully if it meets two conditions. The first is that no concurrently executing transaction modified an object read by T. DSTM tracks the objects a transaction opened for reading and validates the entries in this read set when the transaction attempts to commit. An object in the read set is valid if its version is unchanged since transaction T first opened it. DSTM also validates the read set every time it opens an object, to avoid allowing a transaction to continue executing in an erroneous program state in which some objects changed after the transaction started execution.

The second condition is that transaction T is not modifying an object that another transaction is also modifying. DSTM prevents this type of conflict by only allowing one 16 transaction to open an object for modification. When a write-write conflict occurs, DSTM aborts one of the two conflicting transactions and allows the other to proceed. DSTM rolls the aborted transaction back to its initial state and then allow it to reexecute. The policy used to select which transaction to abort can affect system performance, including liveness, but it should have no effect on the semantics of the STM system.

The performance of DSTM, like other STM systems, depends on the details of the workload. In general, the large overheads of STM systems are more expensive than locking on a small number of processors. However, as the number of processors increases, so does the contention for a lock and the cost of locking. When this occurs and conflicts are rare, STMs have been shown to outperform locks on small benchmarks.

Transactional Memory Today:

Before everybody gets too excited about the prospects of TM, we should remember that it is still very much a topic of research. First implementations are becoming available, but we still have much to learn. The VELOX project (https://www.velox-project.eu/), for example, has as its goal a comprehensive analysis of all the places in an operating system where TM technology can be used. This extends from lock-free data structures in the operating-system kernel to high-level uses in the application server.

The analysis includes TM with and without hardware support. The VELOX project will also research the most useful semantics of the TM primitives that should be added to higher-level programming languages. In the previous example it was a simple tm_atomic keyword. This does not necessarily have to be the correct form; nor do the semantics described need to be optimal. A number of self-contained STM implementations are available today. One possible choice for people to get experience with is TinySTM (https://tinystm.org). It provides all the primitives needed for TM while being portable, small, and depending on only a few services, which are available on the host system.

Based on TinySTM and similar implementations, we will soon see language extensions such as tm_atomic appear in compilers. Several proprietary compilers have support, and the first patches for the GNU compilers are also available (https://www.hipeac.net/node/2419). With these changes it will be possible to collect experience with the use of TM in real-world situations to find solutions to the remaining issues.

Conclusion

Transactional memory by itself is unlikely to make Multicore computers readily programmable. Many other improvements to programming languages, tools, runtime systems, and computer architecture are also necessary. TM, however, does provide a time tested model for isolating concurrent computations from each other. This model raises the level of abstraction for reasoning about concurrent tasks and helps avoid many insidious parallel programming errors. However, many aspects of the semantics and implementation of TM are still the subject of active research. If these difficulties can be resolved in a timely fashion, TM will likely become a central pillar of parallel programming.