Observations of a Digitally Enlightened Mind

Menu

Hadoop

We have entered a new era of information technology, an era where the clouds are moist, the data is obese and incontinent, and the threats are advanced, persistent, and the biggest ever. Of course with all the paradigm-shifting, next generation, FUD vs. ROI marketing, its important to remember that sometimes we need to balance innovation against misunderstood expectations, vendor double-speak, and relentless enterprise sales guys.

Because contrary to the barrage of marketing, these technologies won’t make you rich, teach you how to invest in real-estate, help you lose weight or grow a full head of hair, it won’t make you attractive to the opposite sex, nor will it solve all your problems, in some cases they can improve the efficiency and effectiveness of your operating environment but it requires proper planning, expectation setting and careful deployment…and on that note, I give you the top 10 most overhyped technology terms over the last decade.

So recently I posted some thoughts on big data and the increasing usage of Hadoop, the general theme was data management != data analysis…this caused confusion with some folks, as evidenced by the twitter exchange (tweets haven’t been altered but some extraneous ‘noise’ removed to maximize your reading pleasure)

@Beaker @amrittsering I’m confused by your last blog. Is your point that people are spending $$$ on data aggregation hoping it leads to analytics?

Big data is a scorching hot topic, currently capturing a lions share of the markets available stock of hyperbole and for good reason, data is growing at a meteoric rate.

As we continue to innovate, as business accelerates technology adoption, as the line bleeds between corporate and personal computing and as we interact more in digital mediums we are creating mountains of data. Much of this data is garbage, but some of it is gold (big-data-are-you-creating-a-garbage-dump-or-mountains-of-gold).

Unfortunately with all overly hyped technologies there is a lot of misinformation, failed expectations and the inevitable trough of disillusionment, but that doesn’t mean you have to spend months or years curled up in a fetal position, disillusioned and wondering what went so wrong. With a thoughtful approach you can venture through the murky swamp of your big data and find the insights that provide your company a significant competitive and market advantage.

You’re not really sure how it happened, but some time between last year and the summer of 2011 you were suddenly facing a big data problem, or you were being told you were facing a big data problem, or more accurately you were being told that you needed a big data solution.

Funny thing was that you hadn’t really done anything drastic over the last couple of years that would seem to indicate a tsunami of data was about to breach your storage floodgates, but then again it wasn’t like you watched yourself going bald either.

In these posts Hoff posits that the mass centralization of information will benefit the industry and that monitoring tools will experience a boon, especially those that leverage a cloud-computing architecture…

This will bring about a resurgence of DLP and monitoring tools using a variety of deployment methodologies via virtualization and cloud that was at first seen as a hinderance but will now be an incredible boon.

As Big Data and the databases/datastores it lives in interact with then proliferation of PaaS and SaaS offers, we have an opportunity to explore better ways of dealing with these problems — this is the benefit of mass centralization of information.

Hoff then goes on to describe how new data warehousing and analytics technologies, such as Hadoop, would positively impact the industry…

Even when we do start to be able to integrate and correlate event, configuration, vulnerability or logging data, it’s very IT-centric. It’s very INFRASTRUCTURE-centric. It doesn’t really include much value about the actual information in use/transit or the implication of how it’s being consumed or related to.

This is where using Big Data and collective pools of sourced “puddles” as part of a larger data “lake” and then mining it using toolsets such as Hadoop come into play…