Haswell’s transactional memory explained

Beyond Sandy Bridge, beyond Ivy Bridge, the next generation on the Intel roadmap is a processor code-named Haswell. We don’t yet know much about the specifics of that CPU, but a key detail trickled out recently: Haswell will have extensions to support a feature called transactional memory—a capability that could speed the execution and simplify the development of widely multithreaded applications.

If you’d like to get a sense for what transactional memory is all about, well, you’re in luck, because David Kanter at Real World Tech has produced a nice primer on the subject with some speculation, analysis, and handicapping of Intel’s plans. He covers the academic research into transactional memory, early commercial developments, Intel’s TSX extensions, and what it all likely means for Haswell. Read up to PYAITK, as Billy Wilson would say.

Is this the same thing that Intel and AMD were talking about when they were first releasing multi-core chips to help in the aide of running single threads across multiple chips? I remember Intel I believe mentioning that they would be releasing a driver or a update to their chips that would help run single threads across multiple cores or something along those lines… But that was like 7-8 years ago…

chuckula

8 years ago

Not quite, this is more geared towards making multiple CPUs that are already in a multithreaded program spend less time waiting to access shared resources so that the effective utilization of the CPUs is closer to 100% more often.

bcronce

8 years ago

What I don’t understand is how it handles long running transactions. These instructions only track cache-lines. What if a cache-line gets evicted during a transaction? What if a transaction changes lots and lots of cache lines. Where does it store all of the state for a given transaction?

edit: “[…]limits to the size of transactions”

How is multi-tasking handled with it, example. Low priority thread creates a transaction, changes a bunch of cache-lines, but before it commits the changes, the OS reschedules it and the thread doesn’t get rescheduled for a long while. Do all of those cache-lines remained locked? If they get evicted, how does that affect the thread that started the transaction?

edit: after reading again, I see context switches will almost always cause an abort. effectively, all work must be done during your timeslice.

edit: About Evictions “To avoid data corruption, any transactional data (i.e. lines in the RS or WS) must stay in the L1D or L2 and not be evicted to the L3 or memory.”

I’m assuming these instructions are meant for short lived transactions that modify a relatively few cache-lines.

Another good question. What if a thread attempts to modify two memory locations in a given transaction, where the two memory address cannot be stored in cache at the same time because they would evict each other?

edit: If a cache-line gets locked, which keeps it from getting evicted, what would happen if an application started a transaction and touched enough memory address to cause every cacheline in the CPU to get locked? Would this effectively lock down the CPU?

chuckula

8 years ago

Yeah, these situations you are describing are almost always going to abort the transaction. The model looks like it is very useful for short lock-update-unlock sequences that are very common in threaded programming. The transactions allow you to run these sequences without the lock & unlock most of the time, and to fall back in case of a collision. It is less useful for longer and more complex sequences, where the memory access patterns can get much more complex. At a certain level you have to put limits on the transactions or else you’d effectively be using all of your cache as scratch space.

bcronce

8 years ago

I could see this being VERY useful for .Net applications as the framework could just re-write how the locks work but still logically work the same. Probably a few new types of locks, but I think the first would be the most useful as old apps would instantly benefit.

WillBach

8 years ago

TSX requires a failure path. Hardware that supports TSW is never actually required to successfully execute a memory transaction. Longer-running transactions and transactions that change lots and lots of cache lines are just more likely to fail. Intel has stated that each processor that supports TSX will have a maximum number of address supported per transaction* and using more than that will cause every transaction to fail, every time, on those processor models.

With regard to multi-tasking, my understanding is that TSX doesn’t actually lock cache lines. If a thread performing a transaction is rescheduled, it’s just more likely for it’s transaction to fail.

With regard to multiple memory locations, no two memory locations are mutually exclusive unless you have a direct-mapped cache and no modern processor uses direct-mapped caches exclusively. It is possible that you could attempt to access more lines than your cache has levels of associativity, this would probably just make the transaction fail.

*I forget where I saw this but it makes physical sense, no processor could support an infinite number of instructions that can be rolled back and canceled.

bcronce

8 years ago

After reading some more, I see it very optimistic, but extremely likely to abort if anything changes, which on average should be the corner case.

Your statement of “TSX requires a failure path.” made it all click for me. Thanks 🙂

WillBach

8 years ago

No problem 🙂

bcronce

8 years ago

Nutshel: Hardware accelerated optimistic locking that makes use of cache-coherency.

chuckula

8 years ago

Real World Tech always has interesting articles. It looks like Haswell is just the first iteration of transactional memory and that we’ll likely be seeing improved versions over time to help with scaling to more & more cores.

pogsnet

8 years ago

chuckula

8 years ago

YEAHHH!!!!!!

bcronce

8 years ago

“Cool! But how much faster it can enhance?”

“it can” = “can it”?

Race condition? :p /joke

DancinJack

8 years ago

Sweet. I love reading his little jots on the past few architectures from both teams.