Real-World Hadoop Takes Center Stage at Strata

Alex Woodie

There’s no faster path to budgetary oblivion than implementing technology for technology’s sake. In today’s super-heated big data environment, it’s easy to get all worked up over technologies like Hadoop without carefully considering the business justifications at the same time. At the Strata + Hadoop World conference today, Cloudera co-founder Mike Olson did his best to steer the conversation to real-world Hadoop solutions.

Cloudera’s chief strategy officer “Iron” Mike Olson kicked off today’s Strata keynote extravaganza with a reminder about something he said on the same stage a year ago. “I made a prediction that Hadoop would disappear,” Olson said. (You can read all about it in Datanami‘s coverage of that event.)

“What I meant by that was, applications and solutions on top of the platform would assume the real importance in the ecosystem. We wouldn’t be talking anymore about Scoop and Oozie and YARN. We’d be talking about business problems that got solved for real in the real world.

“I’m pleased to say that, in the last year, we’ve absolutely seen that happen,” he said.

Now, it’s hard to say that Hadoop has completely dropped off the radar. If you had been locked in a cave for the past eight years and suddenly poked your head out just to Google “state of big data technology,” you would not have to scroll down very far before your eyeballs were presented with the curious word “Hadoop.” It’s still front and center in the minds of many peoples, who associate big data and Hadoop together.

But in many ways, Olson is right. While Hadoop hasn’t drifted off into the ether quite yet, there are positive signs that people are beginning to “get it,” and the notion that Hadoop is some type of special machine that can magically turn big data into meaningful insights is (slowly) going away. (What Olson maybe should have said is that these sorts of misconceptions have to go away if Hadoop is going to survive the massive expectations placed upon it.)

As a co-founder of the oldest and biggest Hadoop distributor, Olson absolutely has a vested interest in seeing the Hadoop ecosystem continue to grow and succeed. And for Hadoop to succeed, Hadoop users must be successful.

That brings us to three real-world Hadoop success stories that Olson shared today.

Olson’s Hadoop Success Story Number 1

Olson’s first success story involves Digital Reasoning, a Nashville, Tennessee company that uses Hadoop as the platform for a fraud detection product that’s used by large banks. As Olson explained, the company’s product, called Synthesys, wouldn’t exist without Hadoop, but Hadoop is largely outside the view of Digital Reasoning’s customers.

“Banks for a long time surveilled trades,” Olson said. “Banks are held responsible for the actions of their employees. Regulators impose fines so there’s a finical risk to a bank that doesn’t police it’s people.”

However, that trading data alone is not enough to give a full picture of what’s going, Olson said. That’s why Digital Reasoning built Synthesys, which sits atop Hadoop underpinnings.

“Synthesys uses natural language processing and machine learning technology against content like text message and chats and emails and documents; examines the topics that people are discussing; and flags those that may be of some concern,” Olson continued. “That intelligence can be combined with the trade surveillance to do a much better job of finding bad behavior… to avoid as much as $64 billion in regulatory penalties imposed on financial services players–a fantastic event for that industry.”

Olson’s Hadoop Success Story Number 2

Olson kept the pedal to the metal with his second use case, which involved using Hadoop to analyze electronic medical records (EMRs) in a healthcare setting.

Olson discussed how a Connecticut company called Evariant is using Hadoop to build applications that allow healthcare professionals to use big data analytics to get better insights into EMRs, which ultimately allow them to do their jobs better.

“In healthcare we’re seeing similar advances in the state of the art and similar powerful new solutions come into existence,” he said. “There’s a standard called Health Level-7. That’s the way hospitals and insurance companies share information with one another about the patients and about the treatments they’re delivering.”

EMRs are a powerful enabler of better healthcare delivery, but in order to work with them the data has to flow smoothly among all of these organizations, he continued. “Ingesting data from hospitals – from testing facilities and elsewhere – is absolutely critical. As stuff streams in we want to capture it live,” he said.

Evariant helps by using machine learning and NLP technologies to understand what’s in those EMR reports, and to match clients. The company’s software helps clients get insights into whether procedures or patients that may appear to be different are actually the same thing.

“Managing electronic medical records is important in delivering better car. Big data platforms built on Apache Hadoop make that possible,” Olson said. “But the users aren’t focusing on Hadoop. The users are focusing on healthcare.”

Olson’s Hadoop Success Story Number 3

Olson didn’t let up with this third example, which involves cyber-security (no soft marketing or ad-tech success stories here).

“When you and I walk into our job in the morning, we don’t power on the company-issued desktop that’s sitting on our desk and get to work,” he said. “We show up and we whip out the laptop that our company gave us…We pull out our personal mobile phone and tablet that we use to do our job as well. All those devices and endpoints are a vast expansion in the attack surface on the enterprise.”

A Cloudera partner named CounterTack is helping customers to make that attack surface just a bit more manageable by leveraging the power of big data technology. Its flagship product, called Sentinel, runs atop Hadoop.

“The Sentinel product form CounterTech uses big data tech to monitor, manage and spot bad behavior in real time,” Olson said. “These forensic analysis used to take months–in some cases 300 days–to spot an intrusion. Using Sentinel, CounterTech is able to do that in just minutes.”

In all three of these use cases—finical services, healthcare, and cyber-security—Hadoop is present, but the application vendors and their products get most of the glory, Olson said. “These are solutions used by business users that have no idea that Hadoop is under the covers,” he said. “This is enabled of course by the emergence of new technology in the ecosystem, like Apache Spark.”

The incredible pace of innovation in the big data ecosystem in general is staggering, and that makes implementing technology challenging at times. Security and governance are areas where Hadoop still has some growing up to do.

But if you look at where Hadoop started and the trajectory that the entire Hadoop ecosystem (including Apache Spark) is currently on, it’s hard to argue that it hasn’t delivered. Sure there are companies that have implemented Hadoop before really knowing what it is, and others that are hoarding data in lakes with no current plans to actually use that data to improve the business. The return on investment (ROI) is not always as mind-boggling as we’d like it to be. And Hadoop may not even be mainstream yet–and adoption in fact might even be “anemic,” as Gartner recently said.

“What’s really happened in the past seven years since we started doing this conference is Hadoop has gone from the HDFS storage engine with a MapReduce analytic engine to a really powerful multi-framework, multi-storage system architecture–Spark, Impala, search, MapReduce, HBase, HDFS, many ways to store and analyze data,” Olson said. “All of it integrated with tools like YARN to handle resources management. It is a much more powerful and flexible system than ever before. And that innovation continues.”