cache

11 Articles

[Gamozolabs’] post about Sushi Roll — a research kernel for monitoring Intel CPU internals — is pretty long. While we were disappointed at the end that the kernel’s source is not exactly available due to “sensitive features”, we were so impressed with the description of the modern x86 architecture and some of the work done with Sushi Roll, that we just had to post it. If the post gets you wanting to actually try some of this, you can check out another [Gamozolabs] creation, Orange Slice.

While you probably know that a modern Intel CPU bears little resemblance to the old 8086 processor it emulates, it is surprising, sometimes, to realize just how far it has gone. The very first thing the CPU does is to break your instruction up into microoperations. The execution engine uses some sophisticated techniques for register renaming and scheduling that allow you to run instructions out of order and to run more than one instruction per clock cycle.

Most of us have a pretty simple model of how a computer works. The CPU fetches instructions and data from memory, executes them, and writes data back to memory. That model is a good enough abstraction for most of what we do, but it hasn’t really been true for a long time on anything but the simplest computers. A modern computer’s memory subsystem is much more complex and often is the key to unlocking real performance. [Pdziepak] has a great post about how to take practical advantage of modern caching to improve high-performance code.

If you go back to 1956, [Tom Kilburn’s] Atlas computer introduced virtual memory based on the work of a doctoral thesis by [Fritz-Rudolf Güntsch]. The idea is that a small amount of high-speed memory holds pieces of a larger memory device like a memory drum, tape, or disk. If a program accesses a piece of memory that is not in the high-speed memory, the system reads from the mass storage device, after possibly making room by writing some part of working memory back out to the mass storage device.

The year so far has been filled with news of Spectre and Meltdown. These exploits take advantage of features like speculative execution, and memory access timing. What they have in common is the fact that all modern processors use cache to access memory faster. We’ve all heard of cache, but what exactly is it, and how does it allow our computers to run faster?

In the simplest terms, cache is a fast memory. Computers have two storage systems: primary storage (RAM) and secondary storage (Hard Disk, SSD). From the processor’s point of view, loading data or instructions from RAM is slow — the CPU has to wait and do nothing for 100 cycles or more while the data is loaded. Loading from disk is even slower; millions of cycles are wasted. Cache is a small amount of very fast memory which is used to hold commonly accessed data and instructions. This means the processor only has to wait for the cache to be loaded once. After that, the data is accessible with no waiting.

A common (though aging) analogy for cache uses books to represent data: If you needed a specific book to look up an important piece of information, you would first check the books on your desk (cache memory). If your book isn’t there, you’d then go to the books on your shelves (RAM). If that search turned up empty, you’d head over to the local library (Hard Drive) and check out the book. Once back home, you would keep the book on your desk for quick reference — not immediately return it to the library shelves. This is how cache reading works.

While the whole industry is scrambling on Spectre, Meltdown focused most of the spotlight on Intel and there is no shortage of outrage in Internet comments. Like many great discoveries, this one is obvious with the power of hindsight. So much so that the spectrum of reactions have spanned an extreme range. From “It’s so obvious, Intel engineers must be idiots” to “It’s so obvious, Intel engineers must have known! They kept it from us in a conspiracy with the NSA!”

We won’t try to sway those who choose to believe in a conspiracy that’s simultaneously secret and obvious to everyone. However, as evidence of non-obviousness, some very smart people got remarkably close to the Meltdown effect last summer, without getting it all the way. [Trammel Hudson] did some digging and found a paper from the early 1990s (PDF) that warns of the dangers of fetching info into the cache that might cross priviledge boundaries, but it wasn’t weaponized until recently. In short, these are old vulnerabilities, but exploiting them was hard enough that it took twenty years to do it.

Building a new CPU is the work of a large team over several years. But they weren’t all working on the same thing for all that time. Any single feature would have been the work of a small team of engineers over a period of months. During development they fixed many problems we’ll never see. But at the end of the day, they are only human. They can be 99.9% perfect and that won’t be good enough, because once hardware is released into the world: it is open season on that 0.1% the team missed.

The odds are stacked in the attacker’s favor. The team on defense has a handful of people working a few months to protect against all known and yet-to-be discovered attacks. It is a tough match against the attackers coming afterwards: there are a lot more of them, they’re continually refining the state of the art, they have twenty years to work on a problem if they need to, and they only need to find a single flaw to win. In that light, exploits like Spectre and Meltdown will probably always be with us.

Let’s look at some factors that paved the way to Intel’s current embarrassing situation.

[Gnif] had a recent hard drive failure in his home server. When rebuilding his RAID array, he decided to update to the ZFS file system. While researching ZFS, [Gnif] learned that the file system allows for a small USB cache disk to greatly improve his disk performance. Since USB is rather slow, [Gnif] had an idea to try to use an old i-RAM PCI card instead.

The problem was that he didn’t have any free PCI slots left in his home server. It didn’t take long for [Gnif] to realize that the PCI card was only using the PCI slot for power. All of the data transfer is actually done via a SATA cable. [Gnif] decided that he could likely get by without an actual PCI slot with just a bit of hacking.

[Gnif] desoldered a PCI socket from an old faulty motherboard, losing half of the pins in the process. Luckily, the pins he needed still remained. [Gnif] knew that DDR memory can be very power-hungry. This meant that he couldn’t only solder one wire for each of the 3v, 5v, 12v, and ground pins. He had to connect all of them in order to share the current load. All in all, this ended up being about 20 pins. He later tested the current draw and found it reached as high as 1.2 amps, confirming his earlier decision. Finally, the reset pin needed to be pulled to 3.3V in order to make the disk accessible.

All of the wires from his adapter were run to Molex connectors. This allows [Gnif] to power the device from a computer power supply. All of the connections were covered in hot glue to prevent them from wriggling lose.

We missed the original announcement, but Apple unveiled more than just the iPad Mini at their last event. They’ve got a new storage system called Fusion Drive which is supposed to combine the access speeds of solid state with the storage density of platter drives. When you look just under the surface what you’re really seeing is a disc drive with grossly enlarged cache in the form of an SSD drive. How about moving from the 64 MB or so of cache seen on many large hard drives today to something like 64GB?

Well you don’t have to wait for Apple to do it. [Patrick Stein] gave it a shot using command line tools to combine an SSD with a physical drive. Sure, it’s not an all-in-one solution, but it is a pretty good proof. The linchpin that will really make it possible is a low-level driver that can handle the caching on the SDD, while ensuring that the data eventually makes it to the platter for long-term storage.

In modern computer systems, the biggest bottleneck of information tends to be in communicating with the hard disks. High seek times and relatively slow transmission rates when compared to RAM speeds can add up quickly. This was a necessary evil back when RAM space and costs were at a premium, but now it is not uncommon to see 4GB of RAM on laptops, and even 12GB on desktops. For users whose primary computer use is browsing the internet (either for work, writing articles, or lolcats) and have some extra RAM, moving the browser cache to the RAM from the hard disk is a definite option for increasing speed.

In Linux systems (specifically Fedora and Ubuntu systems), this can be achieved for Chrome and Firefox by creating a larger ramdisk, mounting the ramdisk after boot, and then setting the browser of choice to use that ramdisk as a cache. The necessary commands to do this are readily available on the internet, which makes life easy. Using ramdisks for performance boosts are not exclusive to browsers, and can be used for other software such as Nagios for example.

We have previously covered a tool called Espérance DV for moving cache to RAM in Mac OSX, and for any Windows users feeling left out, there are ways of making Firefox bend to your will. Obviously you will see an increase in RAM use (duh), but this shouldn’t be a problem unless you are running out of free RAM on your system. Remember, free RAM is wasted RAM.