Open for business

Navistar discusses the coexistence of open source and enterprise software in a data scientist’s world

By Kelly LeVoyer, Editorial Director, SAS

While the debates and discussions about data science and data scientists rage on, many companies are opting for action, recruiting and hiring so many data scientists that demand is exceeding the supply of viable candidates.

Because of that, the notion of effectively attracting and retaining data scientists has also become a topic of conversation among the IT and business communities. With such a bullish market, how do you make your company attractive to this highly coveted hybrid of statistician and business analyst?

... a team of data scientists skilled in SAS and Python can now work on different pieces of a project on their respective platform of choice, and easily share their work with each other.

Gyasi DapaaDirector of Data Science
Navistar

For Gyasi Dapaa, Director of Data Science at manufacturer Navistar International Corporation, the key to attracting and retaining data scientists is support and creativity. Providing them with the tools that they not only felt were best for the analysis at hand, but that also provide a creative palette from which the team can draw, has a powerful motivational effect.

“When you bring together twenty people with a variety of technical backgrounds and skill levels, you have to provide a landscape that offers something that appeals to everyone,” Dapaa says. “Some of our staff came in being able to write sophisticated programs from scratch; some are good at taking existing programs or models and tweaking them for their respective projects; some can’t program at all and so you need tools with a user-friendly GUI for them.”

Navistar is a leading manufacturer of commercial trucks, buses, defense vehicles and engines. Navistar data scientists analyze sensor data coming from customers’ trucks including engine and fault codes, in order to prioritize maintenance needs and prevent breakdowns. In addition, they develop marketing and price elasticity models, predict equipment repairs, and more.

The right software for the right purpose

Dapaa says that his team, as the center of excellence for all analytics projects at Navistar, “heavily leverages the power of SAS” to help drive the analytics engine of the company; he also offers his team access to a variety of open source software as well. Why? “Open source is free, so you don’t hurt yourself by using it,” Dapaa says, “and in fact it’s important to provide data scientists with a variety of tools, like a painter has a variety of paint colors and artist’s tools to draw from. Providing some level of freedom and flexibility in the choice of tools is a powerful way to motivate the team and keep them engaged”, he adds.

However, Dapaa sees practical limitations to an open source-only approach. “I don’t believe some of the open source solutions can handle large data sets as well as SAS can,” he says. “I like being able to use SAS as a one-stop shop to query, integrate, clean, profile and analyze the data using a GLM (Generalized Linear Model) technique or a clustering algorithm. Without SAS, we’d need two or three different tools to accomplish all that.” He also cites speed and accuracy as concerns for basing an operation as large as Navistar’s on open source programs that have very little quality control baked into their processes. “Our analyses are just faster and more reliable in SAS,” he adds.

“And SAS Viya even amplifies SAS’ appeal because it enables analysts with different skillsets in software to collaborate better and faster. For instance, a team of data scientists skilled in SAS and Python can now work on different pieces of a project on their respective platform of choice, and easily share their work with each other. With SAS Viya, the SAS analysts can take the pieces of work done in Python and easily incorporate with SAS — a convenience that most data scientists will deeply appreciate.”

For Navistar, Dapaa says, the investment in SAS pays off exponentially in the highly reliable insights and business value they can get from deploying the results. In the end, though, it’s less about the “how” and more about the “why.” “Most data scientists are not motivated by the elegance of the math or the statistics, but how the solutions are helping drive business strategy. You have to help them understand how their work makes a difference to the business.”