Making Your Data Progressively Smarter

Data is data. It’s not inherently dumb or smart. What matters is whether your data, be it large or small, contributes to smarter decisions.

How can you leverage your big-data resource so that it drives smarter decisions? Last year, DATAVERSITY published this thought-provoking Dataversity article in which they stated that the future of Big Data depends on something called “smart data.” By the latter term, they refer primarily to the use of Semantic Technologies that make data self-describing and thereby facilitate delivery of contextualized, automated, and targeted data-driven insights.

As valuable as I find Semantic Technologies to be (and here’s a 2014 Dataversity article that reports my views on this exact topic), they’re obviously not the only enabler for smarter decisions. You also need layers of rich Metadata, predictive models, interactive visualizations, decision-support applications, and other artifacts in order to extract actionable intelligence from all your data. And you need Data Scientists, Analysts, and other knowledge workers whose jobs involve drilling and distilling actionable insights from data at any scale, using any and all of those tools.

As far as I’m concerned, “smart data” refers to your ability to architect, leverage, and manage your end-to-end data resource so that it delivers actionable insights at any scale. In many ways, Big Data should be the centerpiece of your Smart Data strategy, because there are insights you can most effectively and efficiently achieve at extreme scales, as discussed in this TechTarget post of mine from 2013.

To do “smart data” effectively and realize progressively deeper insights, you should implement the following functional capabilities in your end-to-end data resource:

Scale: This is Big Data’s “volume” metric, aligning with the “exploratory data repository” criterion The decision-support value of high-volume data is straightforward. If you have more historical and more detailed data you retain on a given subject of analysis, such as your customers, the more insight you can extract from that data. From a Data Scientist’s point of view, more content is better than less, because it enables them to identify more of the relevant predictors, relationships, and patterns in the data to a finer granularity.

Comprehensiveness: This is Big Data’s “variety” metric, which aligns with the fabled “360-degree view” criterion. If you have more sources, types, and formats of data on a given subject, you can build a more multidimensional view and richer analytics with respect to any given subject. And if you have data in unstructured and streaming formats, you can unlock further insights through tools such as Natural Language Processing (NLP), Machine Learning, and Deep Learning.

Speed: This is Big Data’s “velocity” metric, which aligns with the “agility” criterion. The more rapidly you can ingest data updates on any given subject, the more likely you are to have the latest, greatest, most correct version of that data. Likewise, lower data latencies enable you to pose more questions more rapidly against the data you have within a given time period. Consequently, higher velocities make it more likely that you will be able to make the right decision at the right time with respect of that data.

Trustworthiness: This is Big Data’s “veracity” metric, aligning with the “single version of truth” criterion. It refers to the need for a repository where officially sanctioned systems of record are consolidated after they’ve undergone a process of profiling, matching, merging, correction and enhancement. Trustworthiness comes down to the value of maintaining consolidated, conformed, cleansed, consistent, and current data. If you have your Data Governance process ship-shape and your data maintained at top trustworthiness, you are more likely to make the right decisions based on the most accurate data.

Contextual: This is the first of several “smart data” criteria that are outside the traditional scope of “Big Data.” And this is where “self-describing” Semantic Technologies—such as ontologies, taxonomies, glossaries, tags, and markups–enter the picture. Likewise, here’s where rich Metadata becomes essential. Context consists of all the historical, situational, geospatial, environmental, social, behavioral, and other variables that express the full meaning, uses, and constraints on your data from the point of view of downstream uses. If you can link more context persistently to your data, you can drive smarter downstream decisions that leverage that data.

Relevant: This refers to the need for data curation. That involves having subject matter experts who discover, review, analyze, organize, select, tag, and recommend data for its relevance to various downstream users, uses, and applications. Curators tend to exercise human judgment to a high degree and share out their content recommendations in collaborative and social contexts, albeit with various degree of automated guidance.

Cognitive: This refers to the ability to detect deep statistical patterns in unstructured content—such as data, video, and images—through Machine Learning, Deep Learning, and Artificial Intelligence technologies. As exemplified by IBM Watson, Cognitive Computing enables automated systems to use Big Data to automatically acquire expertise in various subject areas. These technologies can also algorithmically accelerate human learning through interactive engagement with a Cognitive Computing resource around a constantly growing knowledge corpus.

Predictive: This refers to the ability to drive evidence-based predictions into the full range of business processes, operations and decision points. Predictive Analytics is a core function of Data Scientists who use statistical analysis tools to mine data, identify predictive variables, and build and test predictive models against fresh data. Predictive analysis is at the heart of real-world experimentation and A/B testing, which are practices that many companies now embed into their digital marketing and other operational business processes.

Consumable: This refers to the ability to deliver actionable data-driven intelligence into any and all points of consumption, including social, mobile, Cloud, and Internet of Things (IoT). Typically, it is delivered through your Business Intelligence tools and applications. It rides on your ability to tailor, target, personalize, mobilize, visualize, and share data-driven insights to power smart decisions.

Many organizations are already doing “smart data” on most or all of these levels, at various scales, with various data sources. For my further thoughts on this topic, be sure to tune into my Smart Data Online webinar on Wednesday, July 13 at 11:30am (EDT).

About the author

James Kobielus, Wikibon, Lead Analyst
Jim is Wikibon's Lead Analyst for Data Science, Deep Learning, and Application Development. Previously, Jim was IBM's data science evangelist. He managed IBM's thought leadership, social and influencer marketing programs targeted at developers of big data analytics, machine learning, and cognitive computing applications. Prior to his 5-year stint at IBM, Jim was an analyst at Forrester Research, Current Analysis, and the Burton Group. He is also a prolific blogger, a popular speaker, and a familiar face from his many appearances as an expert on theCUBE and at industry events.