The State of Data Management: Challenges, Predictions, and Solutions

The level of sophistication around data has increased quite a bit in the 20 years Ron Agresta has been working with SAS. “Back in the day, it would be no governance, minimal quality, and those capabilities were not well managed. Now the whole segment has shifted,” he said, and it’s much rarer to find companies of any size that don’t grasp the importance of data. “The level of awareness about data within companies is also on a spectrum,” he remarked in a recent DATAVERSITY® interview. There are digital natives working with people who are very limited in their data use and understanding.

He discussed a customer that’s been in business for a long time
and the company has processes that run calculations for insurance rates, perform
billing operations, and other basic functions. The same company has another
part of their business that is much more into aggregating data, doing
analytics, trying to figure out what businesses they should be in, and how they
can make more revenue. “So we’re seeing both the spectrum outside, across
companies, and also within companies.”

Current Challenges

The biggest hurdle that some of Agresta’s customers are facing is in the area of data integration. Because SAS has been doing data integration for such a long time, they assumed at first that the integration problem was solved. “Well, that is true, but not with things like moving data to the cloud,” because he now has customers moving some—or all—of their data and related data processes to the cloud. “We have these hybrid environments where it’s a little bit more challenging to do traditional data integration processes,” he said.

And when Data Governance starts to come into play, it brings a
new level of complexity to the integration piece. “It’s almost like ‘what’s old
is new again.’ It’s still data integration, but the flavor of it has changed,”
because there are more types of data coming from more different sources of data.
Data resides in multiple different locations, “And so we’ve got to go refight
and re-win that battle on the integration side,” he said.

Pain Points

Agresta said that Data Quality is a major source of frustration for his clients. In 2019 AI Predictions from Forrester,” 61 percent of survey respondents said that Data Quality was the biggest barrier to successfully implementing an AI project, and Agresta agrees. “It’s no big surprise that if you don’t have the basics like Data Quality under control, then whatever magic you’re expecting on the analytic side is not going to be what you anticipated.”

Once the data gets blended correctly, it is standardized,
accurate, and up-to-date, and it’s really possible to see the results, he said.
Moving beyond the hype of machine learning and AI, he sees clients deciding if
and where technologies like natural language processing fit into broader
enterprise Data Management and analytics projects that will extract the most
value. It’s important for organizations to get the correct balance of
“offensive” (being agile and exploratory with data) and “defensive” (Data Governance
and control of data) approaches to solving data-centric problems.

Introspective Analytics

More companies are looking at automation and Agresta sees a
need to scale Data Management processes without adding substantially more
staff. Simply adding more people won’t allow organizations keep up with the
continued flow of data. Using advanced analytics paired with automation can
ease the burden on overworked data engineers and data stewards. Analytics can now
suggest actions that can be taken on data to improve it without someone having
to manually analyze loads and loads of data.

As an increasing number of users adopt the suggestions that are
being made to them, the system starts to learn from those actions, and it can
start to automate. “And when it sees something familiar, like a data set that
looks like the data set you spent hours last week transforming, then it knows
the kinds of things that it can do to improve that data.”

This ability to use analytics for improvements in internal
processes can play an important part in self-service enablement, or
self-service data preparation, where users may not have a robust skill set. “They
have the data, they want to use the data, but they don’t have the technical
expertise to do a lot of sophisticated things. But if the system can help them,
then it gets them to doing the real work much quicker,” he said, which empowers
users to do things like report building and advanced analytics.

This introspective process isn’t limited to Data Management, but can be used for other areas, such as Data Governance or quality improvement. Using analytics this way isn’t new, but Agresta said it’s starting to snowball as customers are learning about what is possible. And although SAS isn’t the only vendor in this area, he believes they have the advantage of decades in Data Management analytics. Their solutions portfolio arises from a desire to use the best from an advanced analytics perspective—whether that’s artificial intelligence, machine learning, advanced scoring, “Or whatever it might be—and to automate those solutions, making them less onerous to run and use.”

Data Protection

Extra scrutiny on data collection and usage has put many businesses on defense. Many companies rely almost exclusively on monetizing data relinquished by users, but regulatory attention is increasing in this area. “Seventy-three percent of consumers in the U.S. are very interested in knowing what companies are doing with their data and being able have some control over what companies are doing with that data.”

As organizations are coming to understand how important that is to their customers, it is becoming a major concern for companies:

“If we are doing an analytic process to audit what the inputs are, what the output is, how old the model is, who worked on it, and any other influential factors, it’s no longer unusual for regulations to require an explanation of those answers. It doesn’t have to be down to the if/then/else sort of thing, but it needs to be transparent enough.”

In 2019 and beyond, expect more laws for consumer data
protection with the associated changes to technology needed to cope not far
behind, he said. He also predicts an increased desire for transparency
regarding how data is being collected, aggregated, and shared. This will call
for enhanced technology that can deliver detailed reports to organizations and
their customers about data usage.

Meaningful Results from
New Technologies

Agresta predicts that more organizations will attempt to use AI and machine learning techniques to improve Data Quality and Data Management processes, but they will struggle to see meaningful results. Some companies wanting to adopt AI and machine learning are taking the “throw something at it and see if it works” approach rather than a more deliberate approach to solving key problems. “Anything that we can do to help our end-users make the right decisions about what algorithms to use and how to interpret the results is important.”

It can be easy to inadvertently combine data that shouldn’t be combined before even getting to the analytics, he said, and it’s essential that the company has the right level of data analytic capability before a decision is made on how to move forward with a report, or some other analytic-driven result.

“It’s easy to pick up any old analytic model, or process, or algorithm and apply it incorrectly, so we can’t belittle the fact that our tools are sophisticated, and we have to help our end-users use them in the right way.”

New technologies should be built on a solid foundation in order
to get meaningful results. Understanding and using correct and approved
statistical models to deal with outliers, for example, will directly impact the
outcome. “In the simplest case, if you have outliers that you’re not dealing
with in an appropriate way, whatever comes out at the other end is not
legitimate.”

The Challenge of Data
Governance

Data Governance is a growing challenge as more data moves from
on-premise to cloud locations and governmental and industry regulations,
particularly regarding the use of personal data. hybrid cloud, or hybrid Data Management
systems must be able to communicate with each other about where data resides,
what it contains, and who can access it.

SAS, historically, has developed solutions for metadata integration issues, and they are now also involved with an open source project called Egeria which is part of an ODPi initiative working to solve that problem.

“We’re working with other technology companies like IBM to come up with an open and bidirectional way to share metadata across independent technologies. I think that will go a long way.”

He sees Egeria as a great way to start solving some of the
problems companies face about what data they have, how it works with other data
they have, who is allowed to see it, where it came from, how old it is, and any
number of other attributes that might be associated with that data.

Volume of Data vs. Data
Sources

The challenge of data volume seems to be less troublesome for
the majority of Agresta’s customers. He says that only around 10 percent are
struggling with volume, and those could easily be handled by modest investments
in technology. The remaining 90 percent are looking at the types of data becoming
more of a challenge.

“If you think of structured databases, we’ve got that covered,
and that’s a known, well-worn track. When you start to get to semi-structured,
unstructured data, data streams, all kinds of different data sources, how do
those worlds combine?” So it’s not necessarily data volumes that pose the
biggest challenges, but what’s hidden in the data (good or bad) that can be
difficult to deal with.

Agresta predicts we will continue to see an increasing use of more advanced analytics capabilities to solve complex problems that in years past might have taken large teams and years of research. Advanced analytics paired with good Data Management technology can help detect threats and uncover untapped opportunities.

What Agresta sees as a priority for SAS going forward is about making the end-user’s life easier, whether that means helping with the safe integration of cloud or hybrid stores, ensuring AI and other technologies are able to use quality data, or by helping their customers take advantage of advanced analytics: “Pushing the boundaries in the places that make sense. Seeing what’s successful, but not losing sight of the impact across the organization.”

DATAVERSITY Education

We use technologies such as cookies to understand how you use our site and to provide a better user experience.
This includes personalizing content, using analytics and improving site operations.
We may share your information about your use of our site with third parties in accordance with our Privacy Policy.
You can change your cookie settings as described here at any time, but parts of our site may not function correctly without them.
By continuing to use our site, you agree that we can save cookies on your device, unless you have disabled cookies.