The ravings of a SANS/GIAC GSE (Compliance & Malware)
For more information on my role as a presenter and commentator on IT Security, Digital Forensics Statistics and Data Mining;
E-mail me: "craigswright @ acm.org".

Dr. Craig S Wright GSE

Followers

My Profile

Share it

What is happening

BooksI have a few books and another is on the way for 2012. Firstly, I have to plug the first in the Syngress Series of books on IT Audit. This is a comprehensive compliance hand governance handbook with EVERYTHING (from the high level to the hands on for the expert) to get you started in IT compliance and systems security. The main book is "IT REGULATORY AND STANDARDS COMPLIANCE HANDBOOK". This is the first in a series I have planned and more will follow in time. There will be electronic updates to this book over time to maintain it to a current level over time.

I will be working on co-authoring a book on CIP (Critical Infrastructure Protection) - but more on this later.

On top of this I recycle computers. To do this I take 1.5 to 2 year old corporate lease computers and refurbish them so that they can run the most current programs.

The question is - what do you do to help?

If you do not have the time, have you though about a donation?

This blog has been monetarised. This is where the money goes. By clicking and purchasing on this site, you help Burnside and Hackers for Charity. All monies earned here are split 50/50 between these two charities.

Who I am...or what...

Visitor locations

Tuesday, 5 January 2010

As stated, I plan to publish some of the preliminary results of the quantitative risk research I am conducting. At present, I have been analysing project costs for 277 software projects.

It is clear from the results that there is an optimised ideal for software testing. The costs of finding bugs go up as the cost of software is tested to remove more bugs.

We can demonstrate that the costs of testing relate to the Cobb-Douglass function. In the sample of 277 coding projects, I have a function of:

In this example, we have found and where c and C are constant values.

I have not prettied up the plots and nor have I done much of the refinement that will go into the published paper, but for a few simple metrics we see that complexity adds exponentially to the costs.

The previous plot has an exponent of around 1.5 (this is the exponent of the equation that describes the plot is not linear). In this case, the largest program in my sample was approximately 300,000 SLOC (source lines of code).

So we can see that longer programs cost more (nothing new to anyone who has programmed).

The number of bugs found does vary greatly. This is those bugs found within a year of the code being released, so the true numbers may be higher. However, it comes to reason that if the bug has not been found in a 12 month period, it would be expensive to find.

What was interesting is the distribution of bugs as a percentage of code.

We can see that there is no correlation between the levels of bugs in code and the length of the code. I did find this surprising. There are more bugs in large code, but the number of bugs per line does not increase greatly. The value (the Pearson correlation co-efficient) is close to zero (and is actually slightly negative, but not at a statistically significant level).

The axis on the histogram is the ratio and not the percent (as I stated, I have not prettied up the plots). These are the totals of the bugs in the software projects analysed and not those discovered post-release. I have not classified the stages of development where the errors occurred most frequently as yet. I also need to split some of the results by coding language.

I did find that the numbers of bugs where far too high.

Many of these are functional and did not pose security threats, but the economics of repairing them remains. The mean for the tests was 5.529% (or 55 errors per 1,000 lines of source code).

If anyone has any data from a really large code project (such as Windows, Linux etc) I would be interested in haring about it and incorporating the data into my research.

Software is by nature a complex system and is only becoming more so.

We need to move from "Lines of Code per day" as a productivity measure to a measure that takes debugging and documentation into account. This could be something such as "Lines of clean, simple, correct, well-documented code per day". This also has problems, but it does go a long way towards creating a measure that incorporates the true costs of coding.

The primary issue comes from an argument to parsimony. The coder who can create a small, fast and effective code sample in 200 lines where another programmer would require 2,000 may have created a more productive function. The smaller number of lines require less upkeep and can be verified far easier than the larger counterpart.