In Memory Data Grids

19 February, 2010 10:09 am

In my professional life as well as here (is there really any difference?) one of the things I do a lot is “evangelize” the idea that applications need many different kinds of storage. You shouldn’t shoe-horn everything you have into an RDBMS. You shouldn’t shoe-horn everything you have into a filesystem. Ditto for key/value stores, column or document stores, graph DBs, etc. As I’m talking about different kinds of storage that make different CAP/performance/durability tradeoffs, somebody often mentions In Memory Data Grids (henceforth IMDGs). Occasionally references are made to tuple spaces, or to Stonebraker’s or Gray’s writings about data fitting in memory, but the message always seems to be the same: everything can run at RAM speed and so storage doesn’t need to be part of the operational equation. To that I say: bunk. I’ve set up a 6TB shared data store across a thousand nodes, bigger than many IMDG advocates have ever seen or will for at least a few more years, but 6TB is still nothing in storage terms. It was used purely as scratch space, as a way to move intermediate results between stages of a geophysical workflow. It was a form of IPC, not storage; the actual datasets were orders of magnitude larger, and lived on a whole different storage system.

But wait, the IMDG advocates say, we can spill to disk so capacity’s not a limitation. Once you have an IMDG that spills to disk, using memory as cache, you have effectively the same thing as a parallel filesystem only without the durability characteristics. Without a credible backup story, or ILM story, or anything else that has grown up around filesystems. How the heck is that a win? There are sites that generate 25TB of log info per day. Loading it into memory, even with spill-to-disk, is barely feasible and certainly not cost-effective. There are a very few applications that need random access to that much data; the people running those applications are the ones who keep hyper-expensive big SMP (like SGI’s UltraViolet) alive, and a high percentage of them work at a certain government agency. For the rest of us, the typical processing model for big data is sequential, not random. RAM is not so much a random-access cache as a buffer that constantly fills at one end and empties at the other. That’s why the big-data folks are so enchanted with Hadoop, which is really just a larger-scale version of what your video player does. VLC doesn’t load your entire video into memory. It probably can’t, unless it’s a very small video or very large memory, and you don’t need random access anyway. What it does instead is buffer into memory, with one thread keeping the buffer full while the other empties it for playback. The point is that memory is used for processing, not storage. The storage for that data, be it 4.7GB of video or 25TB of logs, is still likely to be disks and filesystems.

I’m not saying that IMDGs aren’t valuable. They can be a very valuable part of an application’s computation or communication model. When it comes to that same application’s storage model, though, IMDGs are irrelevant and shouldn’t be presented as alternatives to various kinds of storage. (Aside to the IMDG weenie who derided cloud databases and key/value stores as “hacks”: let’s talk about implementing persistence by monkey-patching the JVM you’re running in before we start talking about what’s a hack, OK?) Maybe when we make the next quantum leap in memory technology, so that individual machines can have 1TB each of non-volatile memory without breaking the bank, then IMDGs will be able to displace real storage in some cases. Or maybe not, since data needs will surely have grown too by then and there’s still no IMDG backup/ILM story worth telling. Maybe it’s better to continue treating memory as memory and storage as storage – two different things, each necessary and each involving its own unique challenges.

7 comments on “In Memory Data Grids”

Related to this, I’d come across a recent Stanford initiative (by people like John Ousterhout, Mendel Rosenblum) called “RAMClouds”. In their paper (link below), they argue that low latency in-memory stores have a lot of significance if web services are to entirely replace traditional apps and their estimates of dataset sizes for even large-scale online stores seem modest (< 4TB per year). Two points that I thought make in-memory stores more appealing.

Yes, those estimates are modest indeed. If only IMDG advocates could apply the same modesty, the same conservatism, the same caution to their own claims that they apply to their data-growth estimates. It’s not that such ideas have no merit at all, but that they approach things from such a totally back-asswards perspective. Just look at NoSQL And Elastic Caching Platforms Are Kissing Cousins, for example. The author presents NoSQL data stores as “elastic caching” systems (i.e. IMDGs) that are missing some features. The reality is that IMDGs are NoSQL data stores that are missing some features – like durability. That’s kind of a big one. You can tell the author is being disingenuous by looking at the list of supposed advantages for his preferred technology, and asking yourself whether they are truly unique. Oh look, IMDGs use “clever data replication algorithms” to provide reliability. Why didn’t NoSQL developers ever think of that? “Elastic caching” systems let you add and remove servers, and execute code on the servers. I guess those silly NoSQL folks really have a thing or two to learn, huh? Except that most of them do have that kind of replication, and that kind of dynamic reconfiguration, and in many cases (e.g. Redis or MongoDB) that kind of code-execution functionality as well. Why didn’t Mr. Elastic Caching mention that? Did he not know, or would it just have killed his point to admit it? Which is worse? Which answer would make you trust him less?

Simple fact: a real data-storage system that uses tried-and-true OS caching to serve most requests from memory will beat a system that was designed to be memory-only and then added spill-to-disk as an afterthought. It will perform as well, and it will have better behavior when it comes to protecting data. It will handle a full data-center power outage as well as a single server failure. It will allow the full range of backup and forensics and compliance behaviors that form part of a real data management strategy. That doesn’t mean any representative of category X is better than any representative of category Y for all times and places, but all of those fancy data-lookup algorithms and such can be – and often have – been implemented in a real storage system too. It’s IMDGs that want to be real storage when they’re all grown up, not the other way around.

I think that IMDG’s (to borrow your term) are the way of the future, possibly the now. Obviously when loading data (especially log data) one would likely convert the data to a smaller more efficient form, and in the case of column based stores take the opportunity to compress the data (remove redundancy). There is no good correlation between size of data on disk and the size of the data in a good in-memory-database.

I think there are many large datasets (and data producers) that could benefit from the ability to perform continual analysis or the ability to ask a question and get an answer at the speed of ram (and multiple processor cores). Any scenario that allows you to “browse” or “explore” or calculate a real-time answer on a large dataset comes to mind.

I do agree with the “spill-to-disk” concept being flawed. Mirrored to disk makes more sense. A system that uses a passive background process that uses overlapped I/O to write out dirty pages sequentially to disk from sequential memory would provided a fair bit of integrity without impacting performance. For full integrity one could use an on-disk transaction log in conjunction with an in-memory DB that is purged after it is confirmed that the mirrored data has been flushed.

Ram is getting cheaper and faster, processors are getting wider (core count and memory channel count keep growing), disks really haven’t gotten faster at the same rate. Relative to processor and ram speed, disks have actually increased their role as bottlenecks in “Big Data” problems. Relative to the growth of data, disks have kept up on the capacity side, but don’t read (or write) fast enough to keep any of the cores in a modern multi-core system fully taxed in a productive way. The solution is more servers, with more disks, this works, but it isn’t cheap, especially if throughput is a concern (as compared to a memory based solution which excels at throughput). More processors, more servers, also works for a distributed in-memory solution: more RAM, more processors = more capacity, more throughput.

I don’t think RDMS’s are dead. I actually think they have a vibrant future holding really sensitive data, or fairly static data. I think IMDG’s will be a huge part of the future, allowing people to continually analyze (mine) data that never ceases to grow. IMDG’s are perfect for analyzing systems that produce continual streams of data-in-flight. Perhaps as that data becomes less relevant (older), then it belongs in a slower, cheaper more permanent store.

A system that uses a passive background process that uses overlapped I/O to write out dirty pages sequentially to disk from sequential memory would provided a fair bit of integrity without impacting performance.
Congratulations, you’ve just described (the first part of) what CouchDB, Cassandra, et al do, except that they leverage information about dirty records instead of dirty pages to reduce the write load still further. Also, they don’t worry about sequential access to memory since memory is by its nature good at random access (and linear scans can be bad for caches/TLBs). This is exactly the kind of “we’re way better than X even though we’re clearly just beginning to think through the problems that led to X” attitude that led to my post. It’s hubris, pure and simple. Any idiot can make stuff go fast if it’s only in memory and doesn’t have to survive any but the simplest failure scenarios. OK, not quite, lots of people still manage to screw that up, but clearing that low bar still isn’t exactly evidence of genius.

Writing out dirty pages consecutively to disk is better for the disk subsytem, that is all. Also RAM is great at random access, processors do prefer sequential memory. I agree with what you say in your quote though.

Interestingly, your blog article reminded me of Stonebraker’s VoltDB, so I decided to dig a little — they’re still in stealth but do offer a technical whitepaper.

It seems to be an IMDG, with a single-threaded process per core, and replicas. From what I gather of the technical whitepaper, there is no spill-to-disk, it’s all in-memory replicas. But there is snapshot backup. There also is spooling to another (storage-oriented) database, which I guess he hopes will be Vertica.

The claims are, of course, pretty spectacular, 232k transactions per minute on a traditional DB vs. 9.94m transactions per minute, on a fifth the cost of hardware. What they neglect to mention is the storage limit — some back of the envelope math would suggest 6x Dell R610′s with 24 GB RAM each, so 144 GB, and that’s before replicas, so more likely 75-95 GB in practice.

Based on my experience with lots of database systems in telecom and financial IT, I’d say a large number (particularly departmental systems) are in the 100 GB range, so this kind of solution may be very widely applicable & cost effective. Having said this, there’s a new generation of systems that have massive data requirements (as you illustrate) that this won’t work for. Horses for courses, and all that…