The Data Scientist of the Future: What Will They be Doing?

As data continues to rise in volume and variety, and businesses grow more data-centric, the decision-making power of advanced algorithms is now openly welcome in enterprises of all sizes. Now, the average business user is not affected even if a knowledgeable Data Scientist is not around. Today, technology innovations are increasingly empowering the ordinary staff with tools to conduct analytics on the fly and extract insights. With Artificial Intelligence (AI) and Machine Learning (ML) gaining prime attention in the Analytics and BI markets, the traditional roles of Data Scientists are about to change, as discussed in the DATAVERSITY® article Will Data Scientists Automate Themselves Out of Jobs?

In popular industry literature, there has been much discussion about Data Scientists soon becoming obsolete. However, some fundamentally wrong assumptions have led industry reviewers to jump to such conclusions. Data Scientists bring in a bundle of skills – computer science, programming, mathematics, statistics, and domain knowledge, and it is not easy to replicate these skills through automated tools. Moreover, real-life Data Science projects require “collaboration,” which cannot happen without human intervention.

Recent KD Nugget poll results, shared in the post titled Data Scientists Automated and Unemployed by 2025?, reveal that about 51 percent of respondents believe that the full automation of Data Science will happen within the next 10 years. However, about 25 percent of the respondents think this change will happen in either 50 years or never.

Sebastian Raschka, researcher of applied Machine Learning and Deep Learning at Michigan State University, thinks that the future of Data Science does not indicate machines taking over humans, but rather human data professionals embracing open-source technologies.

It is common understanding that future Data Science projects, thanks to advanced tools, will scale to new heights where more human experts will be required to handle highly complex tasks very efficiently. However, according to McKinsey Global Institute (MGI), the next decade will witness a sharp shortage of around 250,000 Data Scientists in the U.S. alone. The question is whether machines can ever enable seamless collaboration between technologies, tools, processes, and end users. Automated tools and assistants can aid the human mind to accomplish tasks more quickly and accurately, but machines cannot ever be expected to substitute for human thinking. The core of problem-solving is intellectual thinking, which no machine, no matter how sophisticated it is, can replicate.

Widespread ML Automation is Inevitable in Near Future

What the current generations of Data Scientists cannot escape is the all-pervasive automation of ML-powered business systems, where many laborious human tasks will be routinely conducted by tools or bots. So far as Data Scientists are concerned, that is good news, because human minds will be left free to pursue the complex problem-solving issues.

Forrester’s Report Massive Machine-Learning Automation is the Future of Data Science implies that though modern organizations are overjoyed by Machine Learning systems that reveal actionable insights, predict customer behavior, and aid better decision-making, too often these systems are hard to crack. The general understanding is that as ML starts delivering automated models, the learning curve for Analytics users will substantially reduce.

Many businesses either cannot afford to keep a sufficient number of Data Scientists or simply cannot find experts with the right balance of skills. In such scenarios, the automated Analytics and BI platforms will empower the “skilled information analysts” to conduct the daily data science tasks. This will mean a broader access to data sources, data types, and Analytics capabilities.

Widespread automation of business Analytics and BI systems will encourage more business users to pursue data-technology tasks on their own. Gartner Says the Age of the Citizen Data Scientist Is Dawning states that the automation brings a huge financial relief to businesses. Data Scientists typically cost a lot, thus getting the work done with fewer “unicorns” and more automated tools will be a welcome change.

Alexander Linden, the Research Vice President at Gartner, says:

“Making Data Science products easier for citizen Data Scientists to use will increase vendors’ reach across the enterprise as well as help overcome the skills gap. The key to simplicity is the automation of tasks that are repetitive, manual intensive and don’t require deep data science expertise.”

In keeping with this belief, Gartner’s review of Self-Service Analytics makes a solid prediction: That Citizen Data Scientists will produce more Analytics than the real experts by 2019. However, this press report also warns that the success of “self-service” will heavily depend on the robustness of the Data and Analytics Governance.

Self-service, by definition, implies free-form exploration of data, which only a highly flexible but effective governance framework may be capable of handling. This is where the Data Scientists of the future will come in. They will initiate the ordinary business users into self-service through formal “on-boarding” programs.

Are Data Scientists Needed in the Self-Service Analytics World? seems to point out that only a qualified Data Scientist can unravel the mystery hidden behind the flashing dashboards. The average business user, while capable of simple filtering or grouping of data, will never be able to conduct advanced data visualization.

Automated Systems Cannot Replace Data Scientists

Why Automation Won’t Replace Data Scientists Yet from Cloud Computing News confirms that Data Science is set to be the primary differentiator for business success, and thus all major Data Analytics vendors are now focusing on simplifying their systems for broader and faster adoption.

What does that mean for Data Scientists? If systems perform the rote tasks of data cleaning, data integration, and basic data modeling, then the Data Scientists will have plenty of time to concentrate on complex algorithms that machines cannot deliver.

Which Data Science Tasks Cannot be Automated?

There is an impressive list of tasks like Data Cleaning, Data Integration, and routine Data Modeling that ML-enables systems are handling well. Yet, there is a lot more to Data Science. Take the example of Data Wrangling, which involves converting raw data into a machine-readable form. This process requires keen human judgment, which machines cannot be trusted with. The Venture Beat post 4 Reasons Bots Won’t Replace Data Scientists Anytime Soon discusses why Data Wrangling is not a machine-driven task.

Another good example is Data Visualization, where a data expert guides the C-Suite executives or other business users through personal interpretations to arrive at good decision. Data Interpretation and Visualization is still very much a domain of Data Scientists.

A blog post from Data Science predicts that the Data Science projects implemented in global enterprises this year will be more complex, but more collaborative in nature. The inherent nature of these projects will necessitate these changes in the enterprise:

The Chief Data Officer (CDO), who is a seasoned Data Scientist, will feature in every organization to design, develop, and manage the enterprise-wide data strategy. The CDO will directly report to the CEO of an organization.

AI projects will turn attention to crowd-sourced data for enabling useful solutions like road-accident prevention and flood warnings, which mean Data Scientists will be required in the teams.

Data Security regulations (GDPR) will be at the forefront of enterprise operations. Data Scientists will have to be trained in GDPR regulations for companies to stay in business.

Forrester states that by 2020, data-driven businesses will be “collectively worth $1.2 trillion, up from $333 billion in 2015.” Governing and managing these huge data troves will require the participation of seasoned Data Science professionals. Thus, Data Scientists are here to stay and take on new challenges.

About the author

Paramita Ghosh has over two and a half decades of business writing experience, much of which has been writing for technology and business domains. She has written extensively for a broad range of industries, including but not limited to data management and data technologies. Paramita has also contributed to blended learning projects. She received her M.A. degree in English Literature in 1984 from Jadavpur University in India, and embarked on her career in the United States in 1989 after completing professional coursework. Having ghostwritten and authored hundreds of articles, blog posts, white papers, case studies, marketing content, and learning modules, Paramita has included authorship of one or two books on the business of business writing as part of her post-retirement projects. She thinks her professional strength is “lifelong learning.”