I’m Wat I’m!!!

Ever wonder how Google manages all their information? Imagine Gmail, it has to keep track of the billions of emails that get sent out each day regardless of whether or not it is spam.

A Database?
My first thought was a database. But if you think about it, if e-mails were stored in a single database table, it would have billions of rows added each day. This just isn’t possible nor is it efficient when performing a search. So Google cannot possibly store their data in a database… at least not in the traditional MySQL sense.

After a bit of digging around, I found an interesting document written by some of Google’s main architect that describes their file system in great detail. It turns out Google uses a distributed file system spread over many machines. It offers huge storage (hundreds of terabytes) over thousands of machines and thousands of disks.

The advantage of this type of system is redundancy and low cost. Their servers are not top of the line but clustering many of them together creates a highly cost-effective file system.

It’s what Yahoo Does
The owner of the largest database in the world, Yahoo!, takes on a similar approach: clusters of cheap computers that form a distributed file system. In fact, if a computer breaks down, it’s usually cheaper and faster to throw away the computer and replace it with a new one than it is to repair it.

So if you have a bunch of old computers sitting around at home, don’t throw them out just yet… you could create your own distributed file system!