Python Gains Traction Among Data Scientists

George Leopold

(Mclek/Shutterstock)

Data scientists have a growing range of options when choosing analytical tools, and a new survey of tool preferences reveals a roughly even split in preferences among the three leading programming languages.

In its annual survey of leading analytics tools, executive recruiting firm Burtch Works reported this week that nearly 1,200 data scientists and analysts were evenly divided in their preferences for SAS (34 percent), R and Python (both 33 percent). Nevertheless, the survey released Tuesday (July 17) confirms the steady rise of Python programming language, mostly at the expense of the R language.

“Open source tools like R and Python are overwhelmingly favored by professionals with five or less years’ experience,” the survey found. “While SAS continues to see strong support among professionals with 16 or more years’ experience, Python made noticeable gains here as well.”

Burtch said the growing preference for Python reflects an influx of new data scientists with five or less years’ experience who show a stronger preference for open source analytics tools. Indeed, support for Python among this “junior” group has doubled to 48 percent since 2016.

The survey also breaks down tool preferences by industry. SAS was the top preference in sectors such as healthcare and pharmaceuticals (43 percent) along with financial services (42 percent). Meanwhile, data scientists at technology and telecom companies preferred Python. R was the top preference in the retail sector.

Source: Burtch Works

Burtch said its survey separates data scientists from those engaged in traditional predictive analytics. The main reason is data scientists work primarily with unstructured and steaming data while predictive analysts tend toward structured data. Those requirements are reflected in tool preferences, with fully 69 percent of data scientist using Python while predictive analysts prefer SAS by a narrower margin.

Given that “big data” is increasingly driven by the onslaught of unstructured video and other social media data, the growing preference for Python confirms earlier surveys. As we’ve reported, the switch to Python is due in part to a growing number of tools and libraries available to data scientists to parse huge data sets.

Other surveys, including IEEE Spectrum also ranked Python as the top data science programming language.

Meanwhile, R remains popular among mathematicians, statisticians and scientists. The SAS environment from the company of the same name remains popular among business analysts, while MathWorks‘ MATLAB is also widely used in the discovery phase of big data.