ORC (Optimized Row Columnar) and Snappy compression can offer higher performance for Hadoop. But sometimes the Hive planner can’t keep up, causing your queries to run slowly and thus under utilizing your cluster. While this problem would be common to any type of compressed input, here’s something to consider when tuning for performance. To set…

In 2014, Apache™ Hadoop® gathered momentum as the leading platform for big data analytics. Without a doubt, Hadoop is clearly here to stay, it has extended its dominance from enterprise software into social media—Twitter and Facebook both use it—making it hard to imagine a clear successor emerging any time soon. That said, while data scientists…

Daniel Eklund, Think Big’s Engineering Lead and my officemate in our new Boston office, started using the term “Big Data 2.0″ nearly a year ago, but I have been seeing it more and more lately. The “2.0″ moniker is not just marketing hype. A lot has changed in the last year to justify the version bump: A whole new Hadoop. No,…