As-Is Software Architecture

Software for automatically constructing architecture diagrams and metrics from source code and .NET assemblies is presented. Examples for Visual C++, Python, the Linux kernel, Boost and .NET are discussed.

Introduction

Ever wondered how that architecture diagram related to the source code? Ever looked at the source and really wondered? Have you looked at a file dependency graph and wondered what the point was given all the spaghetti? Then maybe you need an Architecture Development Environment (ADE). There are numerous ones on the market (Structure101, Lattix, NDepend, etc.) however this article describes another one; but why write yet another solution? The answer is dissatisfaction with the diagrams being drawn by the existing solutions.

What makes an ADE different from just another diagram of dependencies is that it associates a hierarchy which isn't necessarily that of the file system path, namespace or other artefact of the source code. Initially the software used the filter file associated with a Visual C++ project (for which there is an example that is subsequently refactored); subsequently the software has been generalised to just scan a directory, and then to read the output from a make command - in both cases the user must manually specify a set of filters. The first of these generalisations can be used to show the structure of the software itself, the latter was implemented to process the source of the Linux kernel.

The software implementation, written in Python, uses plug-ins to provide functionality. These plug-ins are loaded and executed by processing a JSON file. For output the options are:

doxygen - create input for Doxygen/DoxyPress

track - append the metrics to a file

dependencies - save a JSON file containing the state from one of the other input options

Different programming languages are supported, C/C++, C# and Python, as is additional processing. The source for various examples is provided in the repository.

Background theory

A graph is made up of nodes and the edges that connect them, in a directed graph the edges have a direction. Rather than being represented as a diagram, a graph can be represented by a matrix called a structure matrix. If there is a lack of cycles, then the order of the rows can be chosen so that the non-zero entries lie on one side of the diagonal.

Given that a graph has N nodes, E edges and P parts (the different parts of a graph have no edges connecting them to other parts) then cyclomatic complexity may be defined as E + P - N. For example, in a strictly layered architecture E + P - N = 0 and for complete mud the number of edges is twice the triangle number of N. i.e. E + P - N = N * N. Thus 0 <= (E + P - N) / N <= N. Note that this formula is invariant under code transforms that split a node into two connected by only one edge.

Visualising with Doxygen

Structure matrix

Firstly, the script calculates the structure matrix for the filters that contain files and attempts to order the rows so that the matrix is lower triangular. Note that the columns are in the same order as the rows and that the diagonal elements are denoted by a slash.

In this case the presence of a cycle in the FEA group stops the matrix from being ordered into a lower triangular form.

Table of metrics

The script then combines the filters into a tree structure. For example, FEA\Core is created with 3 children given as Elements, Fields and SetOfElements.

Then it calculates the values of N, E + P - N and (E + P - N) / N for each of the tree nodes and creates a table sorted in (E + P - N) / N.

Graphs

Each of these branches of the tree creates a new section in the HTML document. For example, at the top level, the script creates an input file for GraphViz to create the first diagram. Each of the nodes is hyperlinked to the appropriate section in the document. The script also calculates E + P - N as being 1 in this case.

The second diagram shows the FEA subsection and a problem in the architecture is evident. The dependence of FEA\Surface on FEA\Core creates a cycle in the graph, i.e., A circular dependency exists. The associated section of the HTML document also displays the value of E + P - N as being 8 and additionally states that FEA itself is dependent on the Linear Algebra library.

Finally, in the third diagram, we show a leaf node, i.e., one which consists of the inter-dependencies of files. Note this is not the usual spaghetti that one would expect to be displayed as we are only displaying a small number of files.

Tracking the metrics

As has been noted the value of E + P - N equals 0 for simple graphs, thus it is useful to track the sum of this metric over all the graphs generated off the source code. The script tracks this along with the maximum value and both values normalised by N.

The script plot.py plots the tracked values. For the code which the example is based on the output is given in complexity.txt. It is to be noted that the comments do mention the removal of unnecessary include files and the changing of filters.

As can be seen from the above diagrams that the effect of refactoring the code is not a monotonic reduction in the metric. Their utility lies in monitoring the long term trend.

Discussion

The Visual C++ example

Significant refactoring took place for the FEA::Core::Elements (original, refactored) filter of the example.

ElementShape was split off from IntegrationRule which was then grouped separately without creating a circular dependency at the next level up. i.e. the single responsibility principle of SOLID was applied.

There was also a factory implemented in ElementHandler that was separated out into ElementFactory and again moved into its own group without creating a circular dependency at the next level up. The effect of this was to bring out an uncluttered inheritance tree for ElementHandler. i.e. The diagram is now consistent with the design intent. It should be noted that this particular mistake may already be classified as an anti-pattern.

The applications own source

Within the Python source code unnecessary (in the context of dynamic typing) import statements are used so that correct dependency structure can be determined.

The Linux kernel

As a stress test the software was run on the Linux kernel 4.6 source code. Obviously there were no pre-existing filters, rather a large number of files and the classical split of an include directory to separate the general header files from the source. This latter point is significant as the whole contents of the include directory need to be redistributed so that the structure of the kernel source code can be determined.

The Boost 1.61 library

This C++ library was chosen for illustration because it used to be a header only library that caused a lot of pain due to the huge number of dependencies that were required when wanting to use just one little header file. In its modern form it uses some macros to define include files and thus when reading the source code these macros had to be substituted appropriately.

The Mono.Cecil library

This .NET library was chosen as an example simply because the tool that reads its assemblies uses it. As the tool doesn't read the source files it creates a tree with leaves corresponding to interfaces and classes and branches corresponding to the namespaces.

Conclusions

A tool for generating interlinked hierarchical architecture diagrams and associated metrics has been developed and shown to produce a clearer understanding of the state of a particular Visual C++ project. To the extent that that code base was refactored to a more flexible state. The tool is highly extensible due to its plug-in architecture and has been generalised to work on Python source, the Linux kernel and general input.

For the main example given, sustainability is enhanced by utilising Visual Studio as the GUI for grouping files together rather than having to separately specify a set of filters.

History

2016/05/10: First release

2016/05/24: Updated the script to hyperlink into the source code, change the complexity metric and produce the table of metrics. Added a link to demonstration on YouTube.

2016/05/29: Added scripts for tracking and for processing the Linux kernel. Modified the main code so that files and groups of files can exist at the same level. Removed the HTML output from the zip and referred the reader online. Subsequently enhanced the discussion given the new data.

2016/06/16: Refactored the tool, uploaded it to GitHub and rewrote the beginning and end of the article.

2016/07/07: Added discussion of Boost and .NET

License

Share

About the Author

Mathematician turned freelance Software Engineer with over 20 years experience, mainly in the development of CAD/CAM/CAE software. Generally worked for smaller businesses - although have been at Lloyd's Register, Siemens and Rolls-Royce. Enjoy living on the edge of the Peak District where I go cycling and hiking.