Data Science Survey: The Results Are In!

Last week we ran a Data Science survey asking four simple questions to our community. In this post, I’ll show you the results of our survey and provide you with a Jupyter notebook; just in case you want to play with the data yourself.

Disclaimer

2,233 people participated in the survey. This is a statistically significant participation for our students, but not the Data Science community in general. Among other factors, the Cognitive Class’ catalog of courses influences who we attract to our site and ultimately who responded to the survey.

Data Science Survey Q1: What’s your level of interest for the following technologies?

We presented respondents with eight data-related technologies and asked them to express their level of interest for each of them. The chart below shows the results.

As expected, there is a high degree of interest (green bars) for Data Science, Big Data, and AI. Virtually everyone showed some degree of interest for these three categories.

Participants showed relatively low interest in hot technologies such as Blockchain, Virtual Reality, and Chatbots. I was somewhat surprised by this result. Though, as the author of our first Chatbot course and an enthusiast of cutting edge technology, I might be biased. 😉

Perhaps, our learners are primarily professionals who might not have yet a concrete business application for these emerging, but still green, technologies. But this is just speculation, of course.

Data Science Survey Q2: What’s your level of interest for the following areas of Data Science?

Our second question drilled down to the Data Science field, asking about the level of interest for specific areas of Data Science.

The data shows a strong interest in all areas of Data Science, exception made for Data Journalism which received a lukewarm response. If you are interested in this topic, I highly recommend taking our Data Journalism course. Storytelling is underrated and I think it will benefit your Data Science career, even if you aren’t a journalist.

Data Science Survey Q3: Which programming language for Data Science are you most interested in?

Our third question narrowed the scope further to the programming language of choice for Data Science.

Almost half of the respondents use or have an interest in Python for Data Science. R and SQL sit strong at 20.96% and 12.4%, respectively. No huge surprises here, but I was expecting Scala to have the fourth place. Instead, Java appears to be ahead of it, with JavaScript in 6th place, beating by a wide margin Julia.

Julia is actually a fantastic language for Data Science and I’d love to see it grow in popularity. Its performance characteristics alone are noteworthy. Unfortunately, it’s still somewhat niche in the Data Science community in general, and clearly among our students. (If you’d like to change this by authoring a course on the subject, feel free to get in touch with us.)

What’s interesting about this question is the fact that we allowed an open-ended Other option. As a result, we truly experienced the diversity of languages people adopt to perform Data Science in. In fact, our respondents also mentioned C#, Clojure, Perl, C, and a few others programming languages.

Data Science Survey Q4: Which Data Science tool are you most interested in?

Finally, we asked about the primary tool or IDE of choice.

Respondents could only pick their most used tool, so it’s not surprising to see Hadoop and Spark do so well among our respondents, who showed a clear inclination for Big Data.

RStudio is also fairly popular at 15.99%, a figure somewhat in line with the results of the previous question. The primary R tool is more popular than any other Python tool among our respondents.

Please note that there is no contradiction here. Python users simply had more choices available, splitting the vote between IBM DataScience Experience (IBM DSX for short), Anaconda, and Jupyter. Combined, over 35% of respondents selected Python tools as their primary tool for Data Science, confirming that Python is at least twice as popular as R among our users.

There you have it. It will be interesting to see how these change over time. In the meantime, feel free to play with the data yourself by using the Jupyter notebook created by my colleague Alex Aklson, author of the excellent Data Visualization with Python course.

If you enroll in his course, you’ll have access to our Labs environment to run the Data Science Survey notebook in the cloud, without having to install anything on your machine. Alternatively, you can sign up with a professional Data Science tool like IBM Data Science Experience.

Where to learn more

Since most of our respondents showed a great deal of interest in Data Science with Python and Big Data, allow me to recommend a couple of resources useful to learn more about these topics:

I am just surprised to see Python as the programming language with higher preferences, and Spark / Hadoop as Data Science tool with higher scores. I would expect Scala and Java, instead of Python, as Spark is natively written in Scala and this is the language of choice for Spark (although, I am aware, Spark also supports Python). Guess, taken into account Python was the preferred language, I would expect Anaconda as the Data Science tool of choice!