To be… Determined!

Monthly Archives: February 2015

It’s been a while since I’ve done an analysis post but! I recently found some inspiring code. To be clear, I’m not trying to call out any developer here. I’m aiming only to use my findings to teach and the following is the gist of the matter at hand:

From inference, I think the noble intent was to use the SortedList to be able to quickly find out if the list contained some key. With the data sorted and as used above, a binary search can be performed to make finding out if the list contains some value a $$O(log_2(n))$$ operation. That part is fine. It is in populating the SortedList that the problem lies.

Written out mathematically, the foreach and the Add inside it (because per Microsoft documentation that Add will run in $$O(n)$$ time if the data isn’t sorted as is the case here) amount to this equation:

$$\sum\limits_{i=1}^n\sum\limits_{k=1}^i k$$

For those who might be a little rusty with your mathematical notation here’s an example of what it would do if $$n$$ was 3:

$$(1)+(1+2)+(1+2+3)$$

The numbers I’ve put in the parens are the summation from the inner $$\sum\limits_{k=1}^i k$$ and is just summing from $$1$$ to whatever $$k$$ is currently. The outer$$ \sum\limits_{i=1}^n$$ is adding between the parens, or summing the results of the inner sums. As it happens, this particular equation generates the number series called the Tetrahedral Numbers. For the purpose of this analysis, we’re most interested in the equation to find the $$n$$th tetrahedral which is:

$${n(n+1)(n+2)\over 6}$$

Cause that factors out to:

$${n^3+3n^2+2n\over 6}$$

And tells us that this particular implementation unnecessarily runs in $$O(n^3)$$ time because the $$n^3$$ dominates the equation. I say unnecessarily because with just a little reworking:

In this case the same goals are accomplished but in $$O(n$$ $$log(n))$$ time due to the the OrderBy() call. Graphically that’s

A huge, very noticeable difference for even small values of $$n$$. To put the impact another way, imagine if that red line was how fast some code was going to drain your smartphone battery vs. the blue line for the same value of $$n$$. That’s why this stuff matters.

At this point in my career I’ve come to expect that these simple inefficiencies will be written, make it through code reviews and out into production because “the code works” and as per the Agile Manifesto “Working software is the primary measure of progress.”

Said manifesto also has, “Simplicity–the art of maximizing the amount of work not done–is essential.” The name of the analysis I’ve done above is called asymptotic analysis. It takes time and knowledge of Computer Science fundamentals to do and that I care to do this kind of analysis has gotten me told that I need to go be a professor instead of work in industry.

Given the demand for software today, I don’t think it’s realistic to expect all members of a development team to be up on their Computer Science fundamentals. I’ve said myself if you can tell a story, you can be a software engineer and I stand by that claim too. I also think it’s reasonable to expect these team members to be teachable. Otherwise business employing these teams will always be chasing what they could be doing instead of actually doing what they aim to do then moving on to the next thing, and/or throwing money away on “simply” written software that “works” but is inefficient and then unmaintainable from developers piling “simple” and inefficient code on top of “simple” and inefficient code. Which leaves me at: