Purposeful Data Ingestion: What You Need to Know About the Data Value Chain

Purposeful Data Ingestion: What You Need to Know About the Data Value Chain

November 28, 2018

Previously, we discussed the three fundamental branches of analytics as well as how each one successively breaks data down into more manageable pieces. In today’s installment, let’s continue the data ingestion conversation by discussing the different components of the Data Value Chain.

Let’s Get Back to Basics

The Business Dictionary Online defines a value chain as “interlinked value-adding activities that convert inputs into outputs which, in turn, add to the bottom line to help create competitive advantage … using inbound distribution or logistics.” Although a great start, this definition paints with a broad brush at best; we’re still missing the data side of the coin.

So, the experts at TSA decided to break the Data Value Chain concept down further into its separate components: ingestion, quality, insight, and publication.

How Data is Culled Through Data Ingestion

Edd Dumbill of IBM explains that ingestion is the process of “bringing data into the data processing system” through real-time data streaming or external batch exports. Sounds simple enough, right? Don’t discount it though, because ingestion makes up the vast majority of the grunt work necessary for data preparation and processing. In fact, according to studies conducted by the New York Times, (it has been estimated that) data ingestion makes up to 50-80% of data science’s overall workload. You can’t begin the analytics process without aggregate data, so this firehose of raw information is where you’ll have to start.

Thankfully, we don’t have to stand in front of that raw data stream and try to soak it up on our own. We have additional tools and vetted processes to help with real-time data assimilation and delineation of information.

Quality and Enrichment Against the Firehose of Data Ingestion

Gartner warns that 40% of business initiatives fail due to poor data quality. Not all data is useful, and some sources are not compatible with your business efforts, so a cleansing process must occur before any other analytics are applied. If this quality vetting process is not executed properly, then you are left with a conduit of raw data that confuses decision makers and takes up unnecessary space in your datacenter.

Quality Control

To accomplish this “cleansing,” data scientists use quality and enrichment tools and processes to “mop up” the stream of data efficiently and effectively. Quality control effectively allows analysts to detect and remove errors or inconsistencies from the data flow. Detailed quality checks focus mainly on data generation (validating source data), file formatting and translation (for compatibility), and application program functionalities (filling data gaps and defining parameters).

Enrichment Function

Once you have your data cleansed of impurities, you can move on to the next link in the chain: data enrichment. This process does exactly what the name says it does—it makes your data better. Enrichment improves the value of extracted data using metadata and exposed correlations in order to extrapolate as much useful information as possible.

Enrichment focuses on matching incoming data with preexisting stored data, correcting invalid or outdated information based on the historical data, and interpolating missing values based on other aggregate data. Essentially, enrichment acts like a data “spell check” across all of your systems, flagging where information is missing, corrupted, or outdated. And much like Auto Correct, technologies like HPE Arcsight can be leveraged to replace the data in question with its most recent discoveries.

From Data Ingestion to Insights

Gaining insight is the next step in implementing your now extracted, filtered, and enriched data points. Simply put, insight is the value and opportunity that you squeeze from your data points. These insights can then be translated into actionable items that continually improve your business.

For a better understanding of the specificity that insight brings, we’ve provided a narrative breakdown of the difference between raw data, analytics, and insights using an ecommerce application as our model:

Your data, in this instance, is the generic historical record that reports that you had 2,000 sessions in the past 30 days. From that historical record, analytics delineates how many of those sessions occurred on iPhones in the UK. And insights reveal hidden metrics like “sessions on the iPhone were 20% less likely to buy your product.”

Through such insights, you can formulate more detailed and targeted plans of action to address weak or dysfunctional areas within your business model.

Actionable Insights and Publication

The fourth step, publication, is when your company takes the insights gathered from the preceding steps and puts them into action. But this is the stage where you have to remember that computing and analytics can’t answer all of your questions.

In other words, you run analytics to “get answers” for your human counterparts. Regardless of the capabilities that analytics offer us, it still boils down to having an experienced in-house technician or engineer to make the final call.

As we have said before, analytics is not a panacea, or a magic bullet. You still have to have human intervention to establish context and history to give real insight into what the company should be doing next. Your IT team will have to collaborate with other internal teams to figure out what to do with those “answers” and decide where to apply them best in your business model.

Maintaining vs. Forward Motion

As briefly mentioned before, the delineation of pure data ultimately reveals actionable insights. Companies that fail to complete the Data Value Chain cycle simply use these gathered insights to support existing ideas rather than drive new and improved strategies or products.

Andrew Gabriel Lang penned a surprisingly modern quote about the use of analytics, all the way back in 1910:

“[They use] statistics as a drunken man uses lampposts—for support rather than illumination.”

The resulting conclusion is a simple one: If you don’t use the data to change your actions, you lose most of the value from collecting and storing it at all. Instead, The true value of Analytics comes from using the data to drive innovative changes in your business that enable you to capitalize on trends and associations that you otherwise wouldn’t notice.