Big Data - What Is It Good For?

This question invariably comes up during big data discussions – ‘What is big data good for?’ Those who are close to the subject can quickly identify numerous examples of how big data can be used for the greater good, including some that are listed here: “Big Data and Hadoop for Competitive Advantage – 5 Sources of Insights and Opportunities.” In a discussion on this topic recently with a high-level government official, I paused to reflect before answering the question, in order to try to give a fresh perspective, and ultimately I decided to characterize my point of view simply as a trio of D2D’s: Data-to-Discovery, Data-to-Decisions, and Data-to-Dollars. I summarize each of these here:

Data-to-Discovery

As a scientist, I was first drawn into data science and into becoming an advocate for big data as a consequence of massive data’s enormous potential for new discoveries. In order to achieve those discoveries, the algorithms and methods of data mining and machine learning come into play. These data science techniques enable 4 major categories of Data-to-Discovery:

Correlation discovery - finding the hidden patterns and trends in the data

Association discovery - finding unusual, improbable co-occurring features or products in the data set

Class discovery - finding new categories and classes of items, events, or behaviors in your domain

Since these correlations and associations represent “knowledge” about our domain, we sometimes refer to this D2D as “D2K: Data-to-Knowledge”. For example, the NIH has focused a major research initiative in this area, called BD2K (Big Data to Knowledge):http://bd2k.nih.gov/. One does not need to be a scientist to appreciate the value of discovery in any domain – e.g., in retail analytics, businesses try to discover new marketing opportunities, new customers, new ways to engage existing customers, new signals that they are about to lose a customer, new categories of customer interests, or any insights that move them closer to their business goals. Discovery brings joy to business analysts, marketing pros, and to scientists, especially to the data scientist. The potential for greater discoveries makes big data even more of a joy to work with. We are always searching for the next example of “beer and diapers” or “hurricanes and strawberry pop tarts” in our big data collections.

Data-to-Decisions

Achieving actionable intelligence (that informs good decision-making) from big data can often be very difficult. This is sometimes referred to as “the last mile challenge.” Taking the bits and bytes of big data, converting those into information packets, and then connecting those information “dots” into actionable knowledge takes both data science and business instinct. Joint human-computer cooperation is essential, particularly for non-trivial decisions. Some decisions may not require human intervention, such as the decision to deliver a discount coupon to a disengaged customer, or to send a welcome back message to a returning customer, or to offer arecommendation from a recommender engine to a customer when they view their online shopping cart.

Algorithms can make suggestions, even autonomously based upon your business rules and processes, but critical decisions need a person-in-the-loop. Objective evidence-based decisions by busy humans are enabled, empowered, and informed by data. As your enterprise grows, the more autonomy you may choose to assign to autonomous decision agents (e.g.,Syntasa’s Decision Science-as-a-Service). These “agents” are guided by machine learning algorithms that have been trained and validated by data scientists and analysts exploring your big data collections. Sometimes, even simple statistical measures of a data sample can inform important decisions –data profiling is one of the simplest tools for data-to-decisions. The 4 primary steps in data profiling are:

Data Preview and Selection

Data Cleansing and Preparation

Feature Selection

Data Typing for Normalization and Transformation

Each of these steps bring you closer to your data (i.e., “knowing your data”), thus bridging that last mile gap: from data to actionable intelligence and data-driven decisions.

Data-to-Dollars

We owe the instantiation of this concept to Jaime Fitzgerald ofFitzgerald Analytics. The meaning is clear – big data is the “new oil”. It is the new source of revenue, the fuel of the new innovation economy, the driver of wealth creation in the information age, and the MVA (Most Valuable Asset) for most businesses, industries, domains, and agencies. Insights on customer behaviors, preferences, and responses to stimuli can be delivered to marketing teams in real-time, autonomously, at a person-specific level of granularity. This is pure gold to business. Both Data-to-Discovery (insights) and Data-to-Decisions are essential ingredients toward achieving business value (Data-to-Dollars) from the big data and analytics resources within your organization.

Data as an Asset

As itemized in the article “Business Leaders Need R’s not V’s: The 5 R’s of Big Data”, we see how big data delivers relevant, realistic, and reliable insights (discoveries) about your business domain (Data-to-Discovery), enables real-time decisions (Data-to-Decisions), and provides greater ROI from your data assets (Data-to-Dollars). Big Data is especially good for achieving the modern version of ROI: Return On Innovation! Therefore, after hearing a lot of discussion about the “V’s of Big Data” over the past few years, which has now been expanded to include the “R’s of Big Data”, we have inevitably arrived at the D2D’s of big data goodness. It is especially exciting to see the powerful new tools that bring these promises to fruition, including the newApache Spark implementation within the MapR Hadoop distribution. Check it out, and you will definitely see what it is good for!

Blog Sign Up

Dr. Kirk Borne is a Principal Data Scientist at Booz Allen Hamilton. Previously he was a Professor of Astrophysics and Computational Science in the George Mason University School of Physics, Astronomy, and Computational Sciences. He was at Mason from 2003 to 2015, where he taught and advised students in the graduate and undergraduate Computational Science, Informatics, and Data Science programs. Before Mason, he spent nearly 20 years in positions supporting NASA projects, including an assignment as NASA's Data Archive Project Scientist for the Hubble Space Telescope, and as Project Manager in NASA's Space Science Data Operations Office. He has extensive experience in big data and data science, including expertise in scientific data mining and data systems. He has published over 200 articles (research papers, conference papers, and book chapters), and given over 200 invited talks at conferences and universities worldwide. In these roles, he focuses on achieving big discoveries from big data through data science, and he promotes the use of information and data-centric experiences with big data in the STEM education pipeline at all levels. He believes in data literacy for all! Learn more about him at http://kirkborne.net/. You can follow him on Google+ here and on Twitter at @KirkDBorne, where he has been identified as one of the social network’s top big data influencers.