Posted
by
kdawsonon Friday May 16, 2008 @11:53AM
from the tale-of-four-kernels dept.

Diomidis Spinellis writes "Earlier today I presented at the 30th International Conference on Software Engineering a research paper comparing the code quality of Linux, Windows (its research kernel distribution), OpenSolaris, and FreeBSD. For the comparison I parsed multiple configurations of these systems (more than ten million lines) and stored the results in four databases, where I could run SQL queries on them. This amounted to 8GB of data, 160 million records. (I've made the databases and the SQL queries available online.) The areas I examined were file organization, code structure, code style, preprocessing, and data organization. To my surprise there was no clear winner or loser, but there were interesting differences in specific areas. As the summary concludes: '..the structure and internal quality attributes of a working, non-trivial software artifact will represent first and foremost the engineering requirements of its construction, with the influence of process being marginal, if any.'"

So while looking at the data collected, I had to wonder if some of the conclusions reached were not something of a matter of weighting - I saw some things pretty troubling about the WRK. Among the top of my list was a 99.8% global function count!!!

I guess Microsoft uses a non-C linker-specific mechanism to isolate their functions, for instance by linking their code into modules. But yes, this is a troubling number.

This would explain some things like lower LOC count - after all, if you just have a bunch of global functions there's no need for a lot of API wrapping, you just call away.

The lower LOC comes from the fact that WRK is s subset of Windows. It does not include device drivers, and the plug-and-play, power management, and virtual DOS subsystems.

Also, on a side note I would say another conclusion you could reach is that open source would tend to be more readable, with the WRK having a 33.30% adherence to code style and the others being 77-83%. That meshes with my experience working on corporate code, where over time coding styles change on more of a whim whereas in an open source project, it's more important to keep a common look to the code for maintainability. (That's important for corporate code too - it's just that there's usually no-one assigned to care about that).

About 15 years ago I chanced upon code in a device driver that Microsoft distributed with something like a DDK that had comments written in Spanish. The situation in WRK is markedly better, but keep in mind that Microsoft distributes WRK for research and teaching.

As I take a break from VS to enjoy some time on slashdot, I have no understanding of why people think VS is so amazing.

Is it because of intellisense? That's kind of nice, especially when your code is so disorganized that you don't remember where stuff is defined. Or if you can't open stuff up in two different windows to see where it is defined, like VS prevents you from doing (yeah, you sort of can, but it's stuck in the main VS window).

Is it because of the debugger? Sure, the debugger is nice, and I like it, but it only helps get rid of the easy bugs. The bugs that really eat your development time are the ones that only manifest themselves after the program has been running for a few hours/days, and usually a debugger doesn't help much with that. Besides, every other IDE comes with a debugger, even GDB works fine if you can handle arcane keystroke combinations.

And on top of it, Solutions and projects in VS are horrible. Why does VS try to save the solution every time I quit? Makefiles have some awful syntax, but at least when they change, I know it's because of what I've done, and I know how to fix it.

That said, I don't consider VS to be a bad IDE, it is reasonably decent. I just don't understand the logic of these guys who think that VS is the greatest IDE ever. It's a question, not a flame.

I've put the data and the SQL queries on the web [spinellis.gr]. It is therefore easy for you to do what you suggest, because the filenames are stored in the database. Just perform a cascade delete for the files you think that don't belong to each system's core and rerun the queries. I'd be interested to know the results.

I can not extrapolate the agreeable portions of your thought to the seemingly obvious short comings of the Windows operating system. On any facet, whether it is security, stability, functionality or reliability. Windows is, far behind on all fronts.... aside from secrecy from a Microsoft point of view.

I'm not claiming anything regarding these external quality attributes of Windows, the metrics I collected just show that there are no vast differences in the code's quality.

Or, perhaps, the WRK has been a meticulous focus at Microsoft before it's release... this is likely possible, as it's WIDELY known, from nearly ALL examples of closed source proprietary software being released to the Open Source, that it takes years just to clean up and prepare for the ultra high standards of the OS community.

This is entirely possible. In fact, a README file in the distribution states:

The primary modifications to WRK
from the released kernel are related to cleanup and removal of server
support, such as code related to the Intel IA64.