Data Science in Software Development

a course for engineers

At least 50% of the development time is typically spent on figuring out the system in order to figure out what to do next. In other words, software engineering is primarily a decision making business. Add to that the fact that often systems contain millions of lines of code and even more data, and you get an environment in which decisions have to be made quickly about lots of ever moving data. How do you approach this challenge effectively?

Description

Developers are data scientists. Or at least, they should be.

Yet, too often, developers drill into the see of data manually with only rudimentary tool support. Yes, rudimentary. The syntax highlighting and basic code navigation are nice, but they only count when looking into fine details. This approach does not scale for understanding larger pieces and it should not perpetuate.

This might sound as if it is not for everyone, but consider this: when a developer sets out to figure out something in a database with million rows, she will write a query first; yet, when the same developer sets out to figure out something in a system with a million lines of code, she will start reading. Why are these similar problems approached so differently: one time tool-based and one time through manual inspection? And if reading is such a great tool, why do we even consider queries at all? The root problem does not come from the basic skills. They exist already. The main problem is the perception of what software engineering is, and of what engineering tools should be made of.

We go through live examples of how software engineering decisions can be made quickly and accurately by building custom analysis tools that enable browsing, visualizing or measuring code and data. Once this door is open you will notice how software development changes. Dramatically.

In this course, you will get to create such custom analyses hands-on using Moose - a uniform and compact platform for creating new analyses. First, you will see how cool assessing systems can be. Second, seeing how diverse use cases can be supported by a small set of tools will challenge the default reflex of relying only on code reading.

Examples

Moose is a cool open-source platform for software and data analysis. Why cool? Because it lets you build all sorts of custom analyses very fast. Often minutes fast. Think of it as R with an highly interactive environment that is also specialized for software systems.

Let’s pick a couple of examples. Here is how you find all classes annotated with @Service that are being called from classes that have ‘ui’ in the qualified name: