Is it “Yes” or “Mostly Yes”? One way to tell is if you hesitate to re-validate all of your information before taking a leap or if you fail to act based on data presented. At best, this slows down decision making and at worst, it completely negates the value of collecting, storing, and organizing data in the first place.

​The importance of trusting your data and why “mostly yes” is dangerousIn the Navy, I was trained to trust key indications to ensure safe operation of a nuclear reactor. Specific indicators triggered alarms and those alarms required immediate actions. Other decisions afforded more opportunity to respond, but still relied upon trusting and validating the indications that were reported. The same principals apply to data stored in your core datasets that is used to drive business decisions.

For core datasets, “mostly yes” falls far short of what is necessary to drive a data driven business. “Trust me” is a phrase you hear when you’re being asked to go against your intuition or gut. It often involves taking a risk and exposing yourself to potential harm. Sometimes emerging trends (and threats) will present themselves in data before they’re generally well known or accepted. An industry maxim may start to bend and your attuned competitors will have a competitive advantage if you fail to detect a leading signal.

Why should you trust your data?The reason why I trusted my reactor plant indicators was because we performed regular checks on our sensors to ensure they were working properly. Without this we would be operating blind. The same applies to your data systems. Consistently ensuring that data is created, stored, and maintained falls under the realm of data stewardship. Looking to ensure that information matches what is expected should be part of your common business practices.

Who is responsible for data stewardship in your organization? The right answer is that everyone in the organization should play some part of ownership. Specific roles and responsibilities will play a more active role depending upon your organization’s needs, structure, and tools. However, one common answer is often always wrong: “that’s simply an IT problem.”

How do you build the ecosystem? It is a full team sport that requires: 1) Designing Systems; 2) Providing Data Stewardship; 3) Making Decisions.

Designing systems starts with understanding the needs of key stakeholders. Often these stakeholders will be key leaders of front-line businesses or functional roles such as finance. Determining their information requirements helps define the core datasets you’ll need to govern. From there, those key stakeholders will be critical in driving alignment and compliance across the business.

Data Stewardship is responsible for governance and influencing those who deal with data to ensure that it is handled in a consistent manner. Good data stewards will need deep domain knowledge, a background in data, and the ability to influence both leadership and those across the organization to ensure program compliance.

Leaders need to demonstrate they are making decisions based on available data. Yes, the data informing these decisions will likely result in strategic or operational choices thus furthering the business. From an ecosystem perspective, these decisions also have the advantage of highlighting the purpose and importance of all of the data stewardship efforts. This should encourage and motivate those in the organization to remain dedicated to further enhance and protect a valuable corporate resource: a firm's data.

It’s not uncommon for firms to face several core challenges recruiting, developing, and retaining data scientists and technical leaders. Below are several reasons why:

Hiring teams don’t really understand what they’re looking for. Data science is marketing genius but remains a foggy term when put into practice. For instance, a ‘data scientist’ could be an expert at developing applications that ingest real-time data streams and develops decision recommendations for business managers. Alternatively, a ‘data scientist’ could be a forensic expert doing research in a laboratory pouring through massive troves of real-world evidence to identify relationships between various treatment protocols and patient outcomes.

Not enough data scientists exist and competition for them is intense. Most organizations today recognize that acquiring this skill set is mission critical for innovation. Unfortunately, the demand for data scientists has grown faster than they can be grown. Everyone needs this skill set, as technology is expanding across all industries and transforming analog processes into digital ones. Firms that fail to invest know they will eventually lose any competitive advantage they may have. However, data science is still an emerging field. Few mid-career professionals have the skill sets that companies require. Emerging graduates are few and still require industry experience before they realize their full potential.

A viable Career Path doesn’t exist at your firm. Firms that aren’t digitally native often struggle with this. Creating a role for a skilled data scientist in your organization may seem straightforward, but how do you envision that person’s career evolving? Can you fluidly describe how a data scientist will mature and grow in the organization with ease? If not, that’s because this is likely a new path that needs to be paved inside your company. More broadly, this is about creating an ecosystem where data scientists can thrive. Junior data scientist positions should have more senior data science leaders working over them to provide them apprenticeship. Conversely, you’ll need to point to senior data scientists who have succeeded inside your firm.

Data Science needs Big Data – Data scientists thrive when either working with vast amounts of proprietary data and/or public data. Often these types of data sets are found either inside a large firm (e.g. Amazon) or when building a platform for larger institutions to use (think startups). Strangely, those that may struggle most in finding top talent are mid-sized firms hoping to build upon their own unique proprietary knowledge. Data challenges at these firms may not be as daunting and less appealing for someone looking to make a mark and advance their career.

How do you mitigate these challenges and compete effectively against the digitally native tech firms? (note: each topic below is worthy of a deeper exploration and could be a subject of future blog posts :) )

​Don’t they know how hard it is? Maybe, but no one said big data was clean and easy. In fact, most things described as ‘big’ are hard and daunting…except perhaps for Clifford, my daughter’s favorite fictional Big Red Dog.

Some describe big data as simply a data set too large to be analyzed (or even opened), in a standard spreadsheet. It requires using programs such as R, SQL, or Python—that name alone seems scary, as well as the somewhat rare (though increasingly common) skill set of a ‘data scientist’. Compounding this problem is that once data is aggregated and analyzed, it will almost always be unstructured which translates to messy and incomplete.

Your data will always be imperfect and be found somewhere along the spectrum between insightful, perfect data and unstructured, random white noise. Where your data falls depends on: 1) a clear articulation of a specific measurement you wish to capture, 2) how carefully you design your data collection systems, 3) how rigorously you enforce data validation rules, and 4) the availability of data to begin with. Each of these topics is worthy of a separate discussion that I’ll address in future postings.

Often, simple is best. For example, joining a new data set might add a marginally useful insight, but could risk generating a significant number of duplicates. Evaluate whether this makes sense and proceed with caution…and save a backup for reversion just in case.

​Lastly, big data analysis is only as good as the trust and decisions that flow from it. Imperfections need to be known and identified so that the data set can be completely trusted within it’s stated limitations. Failing to disclose these limitations will make any big data project a potentially interesting, but useless endeavor.