There is money to be saved

Main memory treats all data the same. In servers, which typically use some form of error correcting code (ECC) to detect and correct errors, the added cost can be significant with today's large-memory servers.

Researchers at Microsoft and Carnegie Mellon University are studying the issue. Finding that ≈57 percent of data center TCO is capital cost - most of which is server cost - and that processors and memory are about 60 percent of server cost, it's clear that reducing memory costs could materially improve data center capital efficiency.

ECC also slows down systems and, due to added logic and RAM, increases power and cooling costs. It's a double whammy.

The researchers wanted to know if applications all need the level of care that ECC provides and, if they don't, how much could be saved through hetrogenous memory systems. The key is to understand how vulnerable a given workload is to memory errors.

What was that masked error?

Not all memory errors create problems, or are even detected. If the error is overwritten before reading, no one will ever know. Here's their memory error taxonomy:

Special Feature

The rise of big data and the demand for real-time information is putting more pressure than ever on enterprise storage. We look at the technologies that enterprises are using to keep up, from SSDs to storage virtualization to network and data center transformation.

In their testing the team injected errors into the memory system to determine how harmful they were. If the app crashed or the results were wrong, they considered it a serious error.

They considered three workloads: WebSearch; Memcache; and, GraphLab. All are common apps in Internet scale data centers.

They reached six significant conclusions.

Error tolerance varies across applications.

Error tolerance varies within an application.

Quick-to-crash behavior differs from periodically incorrect behavior.

Some memory regions are safer than others.

More severe errors mainly decrease correctness.

Data recoverability varies across memory regions

After looking at how applications behave under varying memory conditions, the team concluded that:

. . . use of memory without error detection/correction (and the subsequent propagation of errors to persistent storage) is suitable for applications with the following two characteristics: (1) application data is mostly read-only in memory (i.e., errors have a low likelihood of propagating to persistent storage), and (2) the result from the application is transient in nature (i.e., consumed immediately by a user and then discarded).

A number of popular applications, such as search, streaming, gaming and social networking, fit that profile.

Scale makes the difference

After examining the variable, the authors conclude that heterogenous memory can save up to 4.7 percent on server costs, while still giving 99.9 percent server availability.

That may not seem like much, but if you're spending a billion a year on servers it starts to add up. $47M will buy a nice mix of pizza, PhDs, and data center muscle.

The Storage Bits takeFew enterprises have enough scale to make it worth characterizing and altering apps to save 4.7 percent on server costs. But Internet data centers do and will.

These stepwise enhancements - shaving off a couple of percent here and a couple more there - will keep driving IaaS costs down while enterprise costs keep rising. Enterprise IT managers will have to become brokers more than suppliers to their enterprise customers.

Robin Harris is Chief Analyst at TechnoQWAN LLC, a storage research and consulting firm he founded in 2005. Based in Sedona, Arizona, TechnoQWAN focuses on emerging technologies, products, companies and markets. Robin has over 35 years experience in the IT industry and earned degrees from Yale and the University of Pennsylvania's Wharton...
Full Bio

Disclosure

Robin Harris is a president of TechnoQWAN, a consulting and analyst firm in Sedona, Arizona. He also writes StorageMojo.com, a blog which accepts advertising from companies in the storage industry, and has a 30 year history with IT vendors. He has many industry contacts, many of whom are friends and all of whom he has opinions about.
Robin has relationships with many companies in the technology industry. Every company he writes about may have sought to influence his opinion through carefully-crafted marketing messages and self-serving white papers, gifts ranging from desk calendars, t-shirts, lunches and trips as well as analyst or consulting assignments. He also invests in some technology companies.
Robin discloses financial investments in or client relationships with companies named in Storage Bits. To help readers sort out the gold from the dross in his writings, Robin tries to communicate his reasons as clearly as he can. If you agree, you are intelligent and discerning. If you disagree, well, you disagree.
In all cases, Robin encourages readers to subject everything they read, see or hear on the internet or from politicians to some simple questions: * What assumptions are implicit in the world view and judgments of the author? * What, if any, is the factual basis for the opinions the author expresses? * Is it reasonable, logical and clear? Your critical faculties: use â€˜em or lose â€˜em!