Posted
by
timothyon Saturday November 03, 2012 @06:22PM
from the and-eventually-smoke-signals dept.

snydeq writes "Facebook has said that it will soon open source Prism, an internal project that supports geographically distributed Hadoop data stores, thereby removing the limits on Hadoop's capacity to crunch data. 'The problem is that Hadoop must confine data to one physical data center location. Although Hadoop is a batch processing system, it's tightly coupled, and it will not tolerate more than a few milliseconds delay among servers in a Hadoop cluster. With Prism, a logical abstraction layer is added so that a Hadoop cluster can run across multiple data centers, effectively removing limits on capacity.'"

Actually, its pretty cool. Its a solution to a problem that needed a solution, for once. Quite frankly, even though I'm not an army of PhD C-Sci scientists, I'm sorry I couldn't have come up with it. Its weird little problems like this with their solutions that win the "cool" race. Or the "king of geeks" race, or whatever you want to call the brainaic metric.

What is the sub-problem when running a Hadoop job that has this bottleneck and requires such low latency? Is it something that could have been avoided for a start?

And how does (or if, predictably, the media reports don't explain it, *would*) a logical abstraction layer solve this problem such that Hadoop's programmers couldn't have more easily done it within the application's own code?