Big Data is not junk science but some of your data is probably still junk.

The vast majority of us are not data consumers, but rather are data producers. We don’t spend our days running complex queries in an attempt to figure out how to brew the best cup of coffee. With that said, Big Data Analytics, NoSQL and Hadoop have ascended to the forefront of technological innovations that intend to change how we identify, aggregate, transform and analyze large volumes of disparate and fast moving data. While the technology has made great strides over the past few years, many challenges remain. These challenges relate to conceptualizing Big Data strategies, properly securing Big Data systems and incorporating Big Data technologies within current enterprise governance and risk management frameworks. For most small to mid-size businesses, Big Data is at most something “small” on the periphery of concerns. For those organizations that rely on large quantities of data to drive specific business outcomes, the perspective is certainly different.

You may have come across the three V’s that conceptually describe Big Data. In the most simplest terms, Big data is about volume, variety and velocity. More recently, a fourth prong to the Big Data juggernaut has emerged. The 4th “V” is veracity and it deals with the ancillary integrity issues associated with volumes of fast moving data.

The process of analyzing information in the hopes of producing knowledge starts with the raw data itself. Remember the old adage “garbage in equals garbage out”? It still holds true in the realm of Big Data Analytics. Junk data is the nemesis of knowledge and should be a concern when engineering your Big Data infrastructure. There are a number of data maturity models that describe the “state” of data within organizations. At a high level, organizations with low data maturity typically have nascent analytical capabilities. They usually rely on historical data that offers no perceived strategic value. As organizations progress through the data maturity model, tactical use of data begins to evolve into strategic use. Additionally, predictive analytics becomes a core process that supports enterprise-wide initiatives. Organizations that follow fundamental tenets of business process maturity models will be better prepared to tackle Big Data initiatives.

Thinking about Big Data? Here are a few pointers with more to come in future articles!

Properly assess what data you are pulling in: Big Data is not about gathering every piece of information across one’s organization. Data housed in one’s Big Data ecosystem should serve purposes related to analytical ambitions. Data quality is an innate characteristic and doesn’t increase as a result of analytics. In most cases, junk data cannot be turned into gold.

Data sources should be vetted: Ensuring integrity of data sources can alleviate operational headaches associated with non-vetted data inputs. Endpoints that participate in producing data should be properly secured to mitigate the possibility of ingesting faulty, malicious or spoofed information. Many Big Data solutions can facilitate end-to-end encryption and can further provide attestation that ingested data has not been maliciously modified.

Consider privacy implications of coalesced data prior to ingest: There may be instances where centralized storage of specific datasets that contain personally identifiable information (PII) may be against organizational policies.

Get all stakeholders involved: There are not many things that can kill the success of new projects more than the lack of involving the correct stakeholders. Want to include specific datasets in your Big Data initiative? Make sure to consult with the data owners!

Consult with Big Data subject matter experts: Most big data SME’s will be able to help your organization conceptualize your use cases.

Don’t expect a jump from “no data analysis” to “predictive analytics”: Expectations should be in line with your organization’s present capabilities and resources.