Rethinking software assessment

Traditionally, when we say software engineering we primarily refer to the process through which we build software systems. It's the construction part that captures all the spotlight. However, several studies report software engineers spending as much as 50% of the development effort on assessing the current state of the system from various points of view.

In other words, assessment accounts for half of the budget allocated for building the software system. These are just the direct costs. The indirect costs of this activity can be observed in the quality of decisions made during development.

All these qualify assessment to be a critical activity. However, in practice, assessment is regarded as secondary. While for the process of building the system we have a large body of strong engineering know-how (e.g., patterns, practices, technologies), when it comes to assessment we find mostly ad-hoc solutions. This does not service us.

Assessment must be recognized explicitly and approached as a distinct discipline. It is too important to do otherwise. Only by making it explicit can it be optimized.

Software systems are complex and present many contextual problems. To be effective, assessment must be tailored to deal with the context of the system and of the problem at hand.

The ability to assess a situation is a skill. Like any skill, it needs to and can be educated.

Mastering the analysis technology is an issue but it is not the key one. We do need a dedicated technology, such as Moose, that provides the ability to craft tools fast. But, a tool that checks software is software as well, and software engineers already know how to produce software. The main challenge is to shift the focus of engineers from what to develop, to how to check what to develop.

The challenge is significant because it requires a paradigm shift. But, it comes with the promise of the costs that can be decreased when going from ad-hoc to structured.

And, the best news is that the budget is already allocated. Implicitly, but allocated. And developers already spend it. It just takes the energy and will to do it differently.

When it comes to understanding what, where and how to develop a software system, engineers mostly rely on code reading. However, software systems are large and complex, and thus relying on reading does not scale when we want to understand them as a whole. For example, a person that reads one line in two seconds would require approximately one month of work to read a quarter of a million lines of code. And this is just for reading the code.

Nobody has this amount of time at their disposal. Thus, the reading is typically limited to a part of the system, while the overview is left for the drawing board from the architect's office. Following this strategy, most decisions tend to be local, while the strategic ones are mostly based on inaccurate or dated information.

Standard reporting tools are automatic reverse engineering tools that solve the scalability problem. They can be used out-of-the-box by simply pointing them to the input system, and they produce the report automatically. This makes them easy to use. There are multiple tools around that deal with various aspects of data and software analysis, but most of them take the oracle way: they offer some predefined analyses that provide answers to standard questions. That is great when you have a standard question. However, it turns out that most of the time our questions are not quite standard. In these situations, regardless how smart the analysis is, it is of little use for the analyst.

Let us consider for example a tool that detects problems related to the best practice usages of the programming language, and a system that depends on relaxing some of the constraints to provide a more fluent interface. In this case, the generic detection will provide results that are not useful in the context of the system.

In another example, let us consider a system written in a well-known language that also has an engine working with information stored in custom configuration files. A generic tool cannot know about the custom configuration files, and thus this piece of information will be missing from their detection.

An effective assessment must take the particularities of the system into account. Engineers fall back to code reading exactly because this is where the details are, and because they rather prefer to have access these details then to profit from generic scalability. Granted, modern IDEs do provide some help. For example, given a code location, modern IDEs offer strong management of references which helps greatly when investigating the impact of a change or when identifying the root of a problem. However, as soon as the problem becomes slightly more complicated, such as one concerning multiple locations and multiple types of relationships, the engineer is mostly left to deal with finding the answer manually.