Micro-burst: Master Node Topology

by dave on August 12, 2009

Going along with the Cloud theme that I’m fortunate enough to be a part of, I’ve decided to use the “micro-burst” moniker to section off quick n’ dirty posts on a variety of cloud subjects that I don’t have time to dive into fully. With that in mind, let’s get on with the show.

Today’s topic is based on the link here from The Register. What I find fascinating is that Google has been able to manage their growth using a single master node topology for their filesystem. To the article’s point, a single master node offers a single point of failure especially from a chunklet processing and scheduling standpoint. Bandwidth would also be constrained seeing as how meta would have to pass through and be processed by a single entity. Since I’m unaware of the underlying hardware and scalability of their processing complex for this (though I’ve read through the articles that have attempted to explain it), these processing issues could reasonably be remedied by more powerful system hardware and/or software refinement.

It’s exciting to see that Google has thus far been able to move their GFS platform forward and embrace a horizontal scale-out mechanism for the revision 2 product. Good luck to them as they continue to move their company forward!

Why a Master Node Plurality Makes Sense

When designing for any sort of scale-out filesystem (or what I’d consider a horizontally scalable file system), it makes sense to include the ability to scale the master node (or scheduler node) complexes. The obvious reason behind this is filesystem growth, to be sure, but as metadata processing becomes increasingly complex (i.e. more FS abilities driven by custom meta), the need to ingest data at the same or higher rate as originally specified becomes critical. By having a more robust front-end driven by more powerful master nodes with synchronous metadata indexes (or siloed masters with individual meta dbs), you can maintain latency (time to disk or time to commit) SLAs without completely crushing your cloud’s ability to service I/O operations in general. (see image below for conceptual diagram)

Multiple Masternode File system

Hopefully my musings on this subject make sense. Let me know if you have any questions!