Performance Problems Survey

Gentle Readers, see if you can comment on the following Hypothetical Situation:

Your project is "In Trouble." The Big Boss has just walked into your office and told you that he needs you to save the day. Again?

Which of the following is more likely the culprit of this crisis that has your boss in a panic.

1) A developer failed to code a good algorithm in some module and now a very smart person must come along and replace that module with something much better.

2) The project has far too many layers of abstraction and all that nice readable code turns out to be worthless crap that never had any hope of meeting the goals much less being worth maintaining over time.

Are both of these really crises?

Which of them should we work hardest to prevent as performance professionals?

Has either or both of these happened to "Some Other Team" that you once heard about? 🙂

The main problems I see day-to-day are often people coding in an object orientated language and completely failing to grasp the benefits – resulting largely in code thats I’d imagine looks like the description for #1

I’d say that even though learning the syntax for a language is fairly easy, by the time you really *understand* how to use it, how to architect your code and whats really going on under the hood, you’ve already gone past the skill level where performance engineers can really do anything about it proactively – more wait for the crunch and fix it afterwards…

As you’ve covered a lot of times, I agree with the concept of building in tools support – letting coders of all skill levels get feedback on what they’re doing as they’re doing it, and why it might be a stupid decision.. this will help with both of the examples 🙂

#1 is by far the most irritating as usually it ends up with having to go in and fix tons of spaghetti code rubbish.. which never quite makes it out of the application.

Granted I’ve never seen *too* many levels of abstraction in the stuff I work with, but often a decently thought out system will be easier to sort out – if only because it usually *has* been thought out 😉

I don’t think 2) can cause the big boss to panic…atleast not the details of abstraction. ya if the final module doesn’t do what its supposed to do…well then m sure he is in panic. I am currently dealing with some code which was initially built with the intent of abstraction and mvc pattern but is now just a bunch of crap…the kind you would read and wonder "why would someone ever do that??" … I mean can you imagine custom vb classes whose sole purpose is to set the cursor to waitcursor or default…

but 1) is usually the real culprit. use of wrong data structures, inefficient looping techniques, lack of platform/framework benefits and ofcourse..the bad algorithm…all of these lead to code that goes round and round to do something which could be done in a straightforward manner. and then you hear complains that the webservice access is pretty slow…is it the server or is it the sql ??…now that is such a small range you are dealing with..ya rite 🙂

I beg to differ. The only projects I have seen literally fail due to performance were those in scenario 2. The ones in scenario 1 were always workable.

Scenario 1 is annoying, but you can rip out and replace the whole module if you need to. That’s right folks, if you own the source code then you automatically get the ability to change implementations with extra abstractions.

Now consider scenario 2. If you get the abstraction wrong, not only do you have to rip out the abstraction, you still have to rip out the implementation.

Basic math says that the more layers you add, the less likely you are not going to mess up.

#2 is so much more prevalent in the companies I’ve worked for that it isn’t funny. I’d love to have problem #1: it’s a relatively easy fix. Trying to get developers to stop using every freakin’ pattern in the book for every piece of code they write is by far the worse issue both in quantity and quality of problems created.

But the crises are never understood by the boss to be #2. When features get specced as months instead of days or weeks, and bug counts refuse to go down near release date, there’s a crisis mentality, but telling them the problem is that every element in the UI has it’s own convoluted model/view/controller architecture spread out into 3 different processes is like speaking Greek to them so it’s never really perceived as a "crisis" in itself.

I’ve been in both situations, the direct boss usually doesn’t recognize that one of the developers sucks

in my case for #2, it has been my boss’s boss’s boss that send me and a bunch of other ("trusted") people (from other divisions) to rescue projects that are either complete disaster or need help to actually get it done

which one should WE work hardest to prevent… it doesn’t matter which one WE work on (assuming we are actually the guys writing good code), there’s always someone that can write bad code and full teams that can mess things up for so many reasons

There are many things programmers can do to improve quality, but programmers are lazy, and not necessarily the good kind of lazy

In case #1 the original developer and the "smart person" can be the same person.

You don’t need to improve the performance if it isn’t slow. You don’t know if it is slow until you build it and measure it. (Except for obvious problems that a beginning programmer or a programmer new to a language/framework may not be aware of.)

Case #1 and Case #2 can occur together in the same section of code. As long as modularity and loose coupling has been maintained overall, then it need not be a crisis.

it is nearly always 2 where many people have to cooperate accross continents or to cooperate at the time scale (maintainance of software). Part of the problem is a spec problem to create generic code which is prepared for everything which will be completely rewritten anyway because the next requirement does enforce it. That leads to over engineered solutions.

Once you do not understand a problem completely you try to do "separation of concerns" which leads to these nice architecutural layers which are helpful in the concept phase but not necessarily in the coding phase where you have to obey additional requirements such as performance and maintainability or simply team boundaries (+1 layer).

I’d vote for #2 – #1 does happen, but typically poor algorithms tend to be fairly localised, and fixing / replacing them is not normally a huge problem. Over-engineering of the whole solution, however, can give big perf / maintenance problems – and the effort in fixing it may well approach that required for a complete re-write.

I vote for #2, at least in my area (large business applications) performance is often lost because of many layers involved. The problem with these complex applications is that people often don’t know or cannot predict easily how costly a certain function call to an underlying layer is. It’s also often complicated to get a certain functionality into the right layer, because different departments might be responsible for different layers.

I don’t think 1) is really a crisis. It happens to my team as often as "My daughter didn’t eat veggie today".

On the other hand, we really can’t tell case 2) is a problem until late in the game, and fixing it is a lot of harder.

I think the better comparison is instead of "a developer failed to code a good algorithm", the scenario should be "A developer failed to design and code(using good algorithms) a low level module used by many other developers in the team"

From my experience, when the boss is in a panic its because of something that was just recently introduced into the project that made a dramatic impact on the system’s performance. Generally this is because of #1, since bad algorithms can get easily and quickly introduced into the project at any point in the project’s history.

But an over engineered, over abstracted design is something that takes time to implement and introduce, and usually has been in place since the beginning of the project. And because its been there from the beginning the performance impact is rarely noticed, since its always been slow.

Are both of these a crises? Yes, but I would say #2 is much more so. From a cost perspective its usually fairly easy and quick to optimize a bad algorithm. But to redesign a bad, bloated and over engineered architecture takes much more time and resource than mgt usually wants to give, especially since the customers most likely wont see any new features for the dev input (and we all know features drive sales).

Which one should we work hardest to prevent? Unequivocally #2, because as I stated above, its easy for mgt to buy off on letting you rip out a bad algorithm. But when you approach them and tell them that we need to rip out the core architecture, they usually balk because it means that you’ll most likely rewrite a major portion of the app. This means several devs working for a long time on maintenance (ie: not new functionality) and the testing impacts are huge since it will require a major full pass on the entire app(s).

Some author once said something that really stuck to me. It goes something like "Developers should never design for the intellectual pleasure of design, they should design based on the needs of the application." if you are designing for the pleasure of it you’ll most likely over-engineer it to the point of being "worthless crap".

Definitely #2. At least in case #1 there is a well-isolated module with a performance problem. It requires a smart person to replace it with a better algorithm, but at least you can replace it.

A project that uses too many abstraction layers is much harder to salvage even for a smart person. I personally know of a project that suffers heavily from #2. It is a huge class hierarchy that uses a dataset as the backend. By trying to abstract the dataset away they have all kinds of performance problems. To alleviate that, they have some local cache in the class hierarchy. But that causes all kinds of problems when somebody manipulates the dataset. It is an absolute nightmare, but since many different companies are involved there is no way to redesign it to something more reasonable.

There is something to be said even for very unelegant and low abstraction level code. For example in many cases a big switch statement might be preferable to a big inheritance hierarchy since all the logic is in one place and it is easy to understand, even though OO purists would immediately want to replace it with an inheritance hierarchy.

By the way: what about the struct inlining problems? I am really getting jealous of the java guys. They now have escape analysis, so one of the main performance benefits of structs just went away.

Why is it that microsoft only excels when it is under pressure? Back when java on the desktop was still alive, the MS java VM was consistently faster than the SUN VM. But now that you managed to kill off java on the desktop, the .NET CLR has massive performance problems and you guys do nothing about it.

Poor algorithms can definitely be fixed, often quickly – I’ve got four orders of magnitute speedups just by keeping file handles open between file accesses and other trivial tricks. But the important thing is that the initial code is correct, it’s just slow.

With case 2, knowing whether the code is correct can be impossible and maintaining behaviour while refactoring can be nightmarish. It’s always easy to add complexity, genius lies in making things simple and elegant. As simple as possible (and no simpler) should be the goal.

Due to experience of #2 I’m a big fan of the YAGNI approach – code for the requirement you have today, think about but don’t code for the requirement that might turn up tomorrow.

I don’t really believe there’s any answer to the original problem, but the answers from other commenters make me think of how lucky they are. Even the ones who are crying seem to be pretty lucky.

There’s a ton of stuff that doesn’t work, some of which gets posted to thedailywtf.com, some of which is too mundane even for that forum. I think #1 usually plays a role but #2 also gets its share in thedailywtf.com (virtudyne etc.).

The inefficiencies of multilayering are going to happen anyway, whether overengineered (#2) or engineered properly (not #2 but destined to become #2 in the future). Think of all the emulations of emulations involved in .Net + Win32 + NT Native APIs, or in firmware in hard disk controllers and display adapters, or in hardware in CPUs, or back to software in wireless LAN drivers. Then be amazed when some of these monstrosities still perform well enough to be usable.

If #2 occurs without #1 then I think it is fixable. If #1 occurs then it doesn’t matter if it’s combined with #2 or if it’s standalone, it will require a lot of rewriting from scratch.

In my experience as a programmer in a corporate IT dept. #1 is the usual culprit and it is almost always due to poor database design and inefficient querying. The dominant performance factor is usually poor O/R mapping when constructing large collections of objects from a database. For example, issuing one query per object rather than constructing the collection from a single query, or constructing large temporary in-memory structures rather than streaming through a properly order datareader.

The first place I look for performance improvements are I/O statements inside loops.

I do, however, disagree with commenters who suggest that #2 cannot be fixed with anything short of a rewrite. On my relatively small (35 K LOC) single developer project, I have in a number of cases introduced abstractions that didn’t end up carrying their weight. The right thing to do, in an unpublished API, is to rip them out by doing the opposite refactoring of the one that made it. If making the abstraction was a testable, incremental change, than unmaking it should be too.

Unfortunatly, I think it takes more discipline to say "I spent the day removing an unhelpful abstraction" than "I spent the day making the system more extensible through abstraction." Whenever a programming chore calls for discipline, a certian fraction of programmers will call for a re-write.

I agree with Joel Spolskey — re-writing code that works, or even kind of works, is the atomic bomb of software engineering — you use it as a threat every now and then full knowing that you’re never really going to do it.

Over abstraction is just another bad smell — there are refactorings for that too.

#2 – again and again. Over the past 10-15 years, I’ve encountered this, probably hundreds of times.

The first hint comes when I ask the system designer ‘what does the system do?’ and the answer is ‘there are a set of classes which are related to another set of classes’. I try from another angle ‘How does the system operate?" and the answer: "an instance of an abstract class gets derived from a"…

At that point I know, they have never considered the dynamic characteristics of the system, and the design is describes the static organization of the code. It is frequently the case, even on large projects, that performance, scaling and other dynamic characteristics have been totally ignored, in preference to building class libraries. The justification – "avoiding premature optimization".

Incidentally, this same phenomenon is also a huge reason why projects run over budget and time. I have seen countless class libraries which have been designed for ‘reuse’ but can not be used at all.