Posted
by
Soulskill
on Monday May 24, 2010 @05:20PM
from the if-it-ain't-broke-it-will-be-soon-enough dept.

snydeq writes "Deep End's Paul Venezia takes up a topic many IT pros face: 'When you've attached enough Band-Aids to the corpus that it's more bandage than not, isn't it time to start over?' The constant need to apply temporary fixes that end up becoming permanent are fast pushing many IT infrastructures beyond repair. Much of the blame falls on the products IT has to deal with. 'As processors have become faster and RAM cheaper, the software vendors have opted to dress up new versions in eye candy and limited-use features rather than concentrate on the foundation of the application. To their credit, code that was written to run on a Pentium-II 300MHz CPU will fly on modern hardware, but that code was also written to interact with a completely different set of OS dependencies, problems, and libraries. Yes, it might function on modern hardware, but not without more than a few Band-Aids to attach it to modern operating systems,' Venezia writes. And yet breaking this 'vicious cycle of bad ideas and worse implementations' by wiping the slate clean is no easy task. Especially when the need for kludges isn't apparent until the software is in the process of being implemented. 'Generally it's too late to change course at that point.'"

In most organizations, the IT department is treated as pure cost instead of something that provides strategic value. These IT departments have no chance of getting a budget approved that will allow them to "start over" on any part of their implementation; hence the constant onslaught of temporary fixes and patches.

Budget and the lack of ability to see ahead, on the side of the decision makers.

Far too often decision makers are not the people who also have to suffer, I mean work with the tools they bought. They are often easily swayed by a nifty presentation from a guy who doesn't know too much either but promises everything, and of course the ability to cut cost in half, if not more, so they buy. Only to find out that the solution they bought is not suitable to the problem at hand. And then the bandaids start to pop up.

They are often easily swayed by a nifty presentation from a guy who doesn't know too much either but promises everything, and of course the ability to cut cost in half, if not more, so they buy.

If you've worked in a huge shop, you know that the big software vendors send reps out to IT managers for golf outings and the like. Screw it if the software works or not, just fluff up the guy with the budget rubber stamp.

They are often easily swayed by a nifty presentation from a guy who doesn't know too much either but promises everything, and of course the ability to cut cost in half, if not more, so they buy.

If you've worked in a huge shop, you know that the big software vendors send reps out to IT managers for golf outings and the like. Screw it if the software works or not, just fluff up the guy with the budget rubber stamp.

Reminds me of when I heard the manager of a customer service organization I used to be attached to signed off on a new phone system costing hundreds of thousands of dollars, along with the offhand remark that he and a couple other managers were attending "training meetings" in Hawaii for a few days.

Odd, ain't it? Those sales, I mean, training meetings are always in a holiday resort. When your boss is at a "business meeting" at some place near the sea or high up in the mountains (Summer and Winter, respectively), you better make some room in the next few weeks in your schedule, you're gonna get some new hard- or software.

Yeah, I saw that line and immediately thought about some of the "temporary solutions" people have proposed over the years. The statement is an oxymoron. It's either not a solution to the problem, or it's not temporary.

We've got less of those being made now, because I've taken to listing the previous "temporary solutions" every time someone proposes a new one.

The correct strategy is to get someone whose job it is to take bribes. You pay them a small salary, which they can add to with any extra gifts that they receive. You tell all of the sales reps that they are the ones with the final purchasing authority. Then you let someone competent actually make the decision.

Alternatively, you can use a downwards delegation strategy, where the people at the top have to justify purchasing decisions to the tier below them (recursively). You're free to take as many kic

The problem is not with kludges themselves, but with the fact that IT management does not stress documentation and proper change control procedures enough. If a kludge works, is documented, was implemented with proper change controls, and can be repeated, is it really a kludge anymore? IT has to screw around with stuff to make it work, that's what they (we) get paid for. If all we ever had to do was click on an install button and have everything work perfectly from there, what would be the purpose of an IT department at all? Off-the-shelf software and hardware can never be made to work perfectly for everyone's requirements. IT folks are paid to get non-unique components to work for unique requirements.

The problem is not with these fixes, it's that nobody ever documents what they did, and documentation is not readily available when needed. So, these kludges become tribal knowledge, and people only know about them because they were around when they were implemented or they've heard stories. When this happens, these wacky fixes can come back and bite you in the ass later when something mysteriously crashes and no one can get it to work like it did because nobody remembers what was done to make it work before. As people come and go, and institutional knowledge of older systems slowly erodes, we end up in a situation where everyone thinks the current system is crap, nobody knows why it was built that way, and everyone figures the only way out is to nuke the site from orbit and start over. The trick is keeping it from getting to that point.

Of course, nobody likes jumping through all these hoops like filing change control requests or writing (and especially maintaining!) documentation, so it gets dropped. IT management is more worried about getting things done quickly than documenting things properly, so there's no incentive for anyone to do any of it. Before long, you get a mass of crap that some people know parts of, but nobody knows all of, and nobody knows how or where to get information about any of it except by knowing that John Geek is the "network guru" and Jane Nerd is the "linux guru".

We will never get hardware and software that works together exactly the way we want them to. We will always have to tweak things to get them to work right for us. Citing lack of budgets or bug-ridden software may be perfectly valid, but those problems are never really going to be solved. Having our own house in order does not mean fixing all the bugs or being able to refresh our technology every 6 months. Having our own house in order means we know exactly what we did to make each system work right, we can repeat what we did, and everyone knows how to find information on what we did and why.

Isn't this taught to death in ITIL 101 that every MBA must go through in order to get their certificate in an accredited college? It sort of is sad that the concepts taught in this never hit the real world in a lot of organizations. Not all. I've seen some companies actually be proactive, but it is easy for firms to fall into the "we'll cross that bridge when we come to it" trap.

I've seen some companies actually be proactive, but it is easy for firms to fall into the "we'll cross that bridge when we come to it" trap.

To be honest, in all of my years as a programmer eventually becoming a full software engineer (meaning I design, implement, and maintain software solutions), doing it "The Right Way" has always lead to bankruptcy. Always. Of course correlation is not causation, but for the times I've seen companies fail when "following the process" vs. "Release early and often", the l

To be honest, in all of my years as a programmer eventually becoming a full software engineer (meaning I design, implement, and maintain software solutions), doing it "The Right Way" has always lead to bankruptcy. Always. Of course correlation is not causation, but for the times I've seen companies fail when "following the process" vs. "Release early and often", the latter half were the ones to stay in business.

You can do "release early, release often" within an ITIL framework, just because most places implement it poorly doesn't mean it can't be done well.

If a kludge works, is documented, was implemented with proper change controls, and can be repeated, is it really a kludge anymore?

Yes.

You've either don't know what a kludge is, or don't have enough ability to see how fixing things or implementing something the wrong way can really be a horrible mistake that feeds on itself and creates other mistakes. Kludges aren't something you can simply document around. The rest of your post isn't really worth responding to, since it makes the false assumption that kludges are simply poorly documented behavior. If that's the worst you've seen, you're lucky.

I wish I wouldnt have blown all my mod points today or I would have modded you up. Your dead right. Implementing something the wrong way is similar to building a mansion on the sand. Understanding the sheer destructive force of a kludge is what separates the senior engineers (and sysadmins in particular) from the rest. This is why you get a crusty veteran shooting down all your bright ideas without ever really explaining it. Because its a game of chess, kludges lead to checkmates... but only the pro's can s

The problem with the whole idea of "if we only had enough documentation and change control" is that it becomes a non-trivial event to actually read through the documentation. Let's take an imaginary system that's been in production for 5 years...assume every last drib and drab of change has been documented...now you've got a 2000 page document and several hundred change records that tell you *everything*. Except, when it comes right down to it, mastering that 2000 pages of documentation and all the changes made afterwards is a months if not years long project - hardly effective for dealing with production problems that need to be solved in minutes or hours.

The illusion being perpetrated here is that people are interchangeable, and if you just have enough documentation, you can replace Mr. Jones with 20 years of hands on experience with the system with Mr. Vishnu living in Bangalore (or even Mr. Smith in the next cube, for that matter), with a net cost savings.

Now, I'm not saying documentation is a bad thing -> lord knows, it helps to have a knowledge base you can search...but knowing what to search for is knowledge you only get by real world experience with maintaining a production system. This is not digging ditches, boys and girls, this is skilled, if not essentially artistic labor.

I agree completely with this comment. I haven't seen yet any in-house documentation database that is valueable at all. Writing good documentation is not an easy task and the guy fixing problems usually just don't know how to write a simple plain non-technical letter. Imagine now he is having to write technical stuff for the next guy knowing nothing about the system or knowing too few about it to be able to ready a few short notes.

It's not so much that no one wants to do them (although this is true to a degree), but that most people, especially managers, never consider that making quality documentation and obeying procedures takes time and effort, sometimes almost as much as developing the software itself. IT professionals are almost always under unrealistic deadlines to get the project finished and working and so these "non-essentials" are among the first things to fall by the wayside when the project meets a looming deadline. Thi

Speaking from experience, one place this can actually happen is NASA. It may seem like a HORRENDOUS amount of paperwork to do for a flight project, but you better believe it makes a difference. For example, the space shuttle has so many retrofits and modifications and just plain screw-up-fixes it's a miracle the thing flies at all, but they have the procedures so well controlled that now every band-aid get applied every time like clockwork.

To be blunt, most IT departments act like cost centers and don't provide any strategic value. Business units help by shorting the budgets and whining about band-aid technology instead of seeing how IT can build the business. It takes an exceptional move by IT or amazing insight from a business unit to raise IT above the slog and allow it to provide a competitive advantage to the business units. Projects that do this get firehosed with funding.

The problem is a vicious circle. IT doesn't want to make any grand proposals because they fear management will tell them to just squeeze it in when they're not busy but will expect an actual delivery date as if it were a properly funded project with dedicated staff.

This is because IT is managed by managers, not engineers.If all managers had coalface IT backgrounds at least (even to the point of just helpdesk) the problem would not be there.As usual strategic and policy decisions are being made by people who don't understand the nuts and bolts.Would you design a car by having a committee of non engineers approving every major decision. No. But that is how IT infrastructure seems to be built...

I don't think budget is a problem at all here. The problem as described by the article is with vendor-provided software being crufty and having all kinds of problems. The author even mentions that normal free-market mechanisms don't seem to work, because there's little or no competition: these are applications used by specific industries in very narrow applications, and frequently have no competition. In a case like this, it doesn't matter what your budget is; the business requirement is that you must use application X, and that's it. So IT has a mandate to support this app, but doing so is a problem because the app was apparently written for DOS or Windows 95 and has had very little updating since then.

The author's proposed solution is for Microsoft to jettison all the backwards-compatibility crap. We Linux fans have been saying this for years, but everyone says we're unrealistic and that backwards compatibility is necessary for apps like this. Well, it looks like it's starting to bite people (IT departments in particular) in the ass.

You should strive for SOME backwards compatibility not honestly supporting more than a generation or two back is too far. Especially when you have a major OS revision (Win2k and Vista come to mind) to act as a breaking point.

To be honest, though, Linux is generally very good at backwards compatibility if you statically-link everything when you compile (as is frequently the case with commercial software). The Linux system calls never change, except to add new ones once in a while, so it should be very rare that something doesn't run.

Of course, if something is compiled with dynamic links, this isn't the case, as many of the dependencies will change over the years, but that's why static-linking is available, to avoid this problem

That is very true. Unless you are working for a highly visible technology company or high-profile corporation, most companies simply want you to keep the mess you've got going, no matter if it meas bandaids and soldering irons. Over the course of my 20 years career at four different companies, and from talking with colleagues, it is much the same story - the steering committee says, we initially invested hundreds of thousands of dollars, but we sure as hell aren't overhauling and starting over.
The best

the IT department is treated as pure cost instead of something that provides strategic value.

I can't count the times I've gone in somewhere and saw major deficiencies in their IT infrastructure. I mean really bad, O-M-G size problems. And when you point them out they act like you're trying to pad your billing. Just fix whatever isn't working that day. One of them was a doctors office.

Imagine if their patients acted that way. I don't care if I have cancer, just remove that lump in my underarm.

As a dev, what's the problem with a 24 port gigabit switch as the "core" on a medium sized office? Aside from the fact that 10Gb is becoming popular (has become popular?) in the datacenter? Most desktops are only at the 1Gb level (and most users at below 100Mb), and most inbound internet pipes are much smaller. I don't understand the downfall here.

Redundancy? In such a small setting if thing dies you either go out and buy a single easily available item or more likely you dust off the old dumb 100mb/s switch it replaced and pop it in the rack.Layer 3 switching is nice but has only very recently become cheap enough to justify in many settings and it really isn't needed much. In a small office if somebody brings in something which acts as a DHCP server the sysadmin simply wanders around with a blunt instrument until the culprit is found, and unused po

The network it was running was not a small network. Not at all. It was a travesty that this poor switch was running the network. Well over 200 devices plugged into other 2548s all bridged back to the poor "core" switch.

Now that's the sort of thing you should have written first instead of the stupid blanket statement isn't it?It's just like in engineering - it's very easy to stop things corroding, just coat everything in gold. In practice you determine where to put the expensive bits. Even in the example you give here we can't tell if using the device as a core switch was a bad idea or not for instance in cases where there isn't much activity since two hundred things doing almost nothing still adds up to almost nothing,

Ditto this - plus, in a medium-sized office, you're probably not getting 10x24Gb/sec out of your server infrastructure anyway. Your network is only as fast as the slowest component you rely upon; at 10Gb/sec, you're starting to bump into the limits of your hard drives, especially if you have more than a handful of people hitting the same RAID enclosure simultaneously.

As a dev, what's the problem with a 24 port gigabit switch as the "core" on a medium sized office?

If all you've got is 24 hosts (well, 23 and an uplink), then it's fine. I suspect that the reality he's alluding to is something more along the lines of multiple switches chained together off of the "core" switch. The problem is that lower-end switches don't have the fabric (interconnects between ports) to handle all those frames without introducing latency at best and dropped packets at worst. For giggles, try hooking up a $50 8-port "gigabit" switch to 8 gigabit NICs and try to run them all full tilt. Antics will ensue... The cheap switches have a shared fabric which doesn't have the bandwidth to handle traffic between all the ports simultaneously. True core switches are expensive because they have dedicated connections between all the ports (logically, if not physically... I'm no switch designer), so there's no fabric contention.

Absolutely nothing. A 24 port gigabit switch makes a great foundation for a small to medium-sized network with typical business use. It's a stretch to call it a 'core', but anybody who tells you that you need some kind of crossbar fabric chassis switch at the center of your average branch office is just trying to sell you hardware and service contracts.

Its wonderful to ride the developer ship. But once the brilliant code is down on silicon, and runs into reality it must be patched. Patching is so humdrum, so tedious. No admiring fans, just plodding day after day finding that routine that seems to always call the variable from nowhere "Object not set to an instance of an object".
Frank Loyd Wright was a great architect. People marvel at his design. Few know the name of the roofer who has to repair the design flaw that makes every Wright Roof Leak...

Everyone wants to work on the latest and greatest stuff, no one wants to maintain or even release patches.

is very, very true. We (Apple) have a hard time getting applicants who want to do anything other than work on the next iPhone/iPad/whatever. Mainline kernel people are difficult to hire, even though the same kernel is being used on the iDevices as is being used on the regular Macs. Everyone wants to work on the new sexy. For some positions, that works, but for most of them, you have to prove yourself elsewhere before you get your shot.

I think that, for the most part, we see the same thing in marketing for higher education (with the exception of one track, one of the universities I went to has become a diploma mill for Flash game programmers; sadly, I would not hire recent graduates from there unless they have an experience track record). There are video game classes at most universities, but while it might be sexy, you are most likely not going to be getting a job doing video games, 3D modeling for video games, or anything video game related, really, unless you get together with some friends and start your own company, and even then it's a 1 out of 100 chance of staying in business.

I don't really know how to address this, except by the people who think they are going to be the next great video game designer remaining unemployed.

Everyone wants to work on the latest and greatest stuff, no one wants to maintain or even release patches.

I don't really know how to address this, except by the people who think they are going to be the next great video game designer remaining unemployed.

Here's how you address it: you hire one of those 9 out of 10 CS graduates who "Just got in it for the money". Had you offices in the Midwest, you'd have no problem finding programmers whose only ambition is to crunch out brain-dead code until they can move into management. Trust me, I work with these people and they're even worse than the primadonnas interested only in the "cool" things. Naturally, not everyone can be the next game programmer, or work on cool things, but you probably don't want to hire those whose only ambition is to do the grunt work.

Typically, the primadonna has to have his ego coaxed into doing the grunt work. But you can usually count on him to do it fast, and not to make a total mess of things. Granted, some people have a higher estimation of their abilities than their peers. But at least someone passionate about coding can be inspired to improve their code; they'll actually accept coding standards once reasonably explained. But here's a short list of problems with the typical "career type":

Because they don't have the intelligence or the initiative to do things right, they'll happily plod along, even when the given design can't possibly work, or can't be delivered on time. And when it does fail, rather than trying to understand *why* it failed, or *what* they could do differently next time, they blame their coworkers/subordinates, etc.

They are more sensitive to the political implications than the technical ramifications of their decisions. Consequently, they'll often run with an inefficient, or sometimes even incorrect design so as to placate their superiors. And once again, the blame always lies with *someone else*.

And speaking of blame, they'll frequently blame others when things go wrong, and even sometimes when they don't. There are *certain people* at the office around whom I can't have a technical discussion with coworkers because they understand neither about what we are talking, nor that such conversations are a normal part of the job. I've actually been reprimanded for discussing architectural decisions, because "we've already decided on the architecture..." Which is great, but the fact that you've decided doesn't help me understand it better. Supposedly, we're all mind readers here, and no discussion is necessary.

The career types usually promise unrealistic deadlines, and write horribly unmaintainable code. After all, writing code is just a stepping stone into management, and maintaining that code will soon become *someone else's* problem, not theirs...

And perhaps the worst part is that they have a corrosive effect on teamwork and morale. With a politician in the office, *no one* wants to do the grunt work out of fear that it will adversely affect their career.

It's easier to convince a rock-star programmer that documentation is necessary than it is to convince the career-track political programmer that a race condition is a problem, that architecture matters, that maintainability and scalability are important. Just the other day, I had a department manager question the value of writing reusable code - in fact, he was so hostile as to suggest that it wasn't worth our time to make code reusable... (And not only that, but reported to my boss that my suggestion otherwise was "distracting to what we're trying to accomplish here"...)

I know the starry-eyed programmers can be a handful at times, but those indifferent to technical issues will lay a minefield in your company. Suddenly, years after they've moved on, you'll find your new hires telling you the projects they built aren't worth salvaging, that you'll have to start over, etc... I've seen these types move into management and turn an otherwise fun profession into a death march. You don't want the stupid, or the political, types of people writing code. They'll set your company up for failure every time.

Funny thing, I'm graduating next year and *like* doing kernel work... My actual track at school is Programming Languages & Compilers, but I've known how to do low-level stuff since high school. Would you happen to have a contract email address you can send me?

Some of the concurrency stuff needs a complete rewrite - acquiring synchronization primitives is painful, the new 'amazingly fast' locking that they use for GCD is marginally better than a FreeBSD mutex, and between one and three orders of magnitude (depending on load) faster than a Darwin mutex. Part of this is a userspace problem (not optimising for the uncontended case, which is the most common in good code), but a lot of it comes from the route down through the myriad kernel layers when sleeping a thre

The constant need to apply temporary fixes that end up becoming permanent are fast pushing many IT infrastructures beyond repair. Much of the blame falls on the products IT has to deal with.

Well, sure, IT departments place the blame there. The problem, though, is not so much with the products that IT "has to deal with" as with the fact that IT departments either actively choose the penny-wise-but-pount-foolish course of action of applying band-aids rather than dealing with problems properly in the first place, or because -- when the decision is not theirs -- they simply fail to properly advise the units that are making decisions of the cost and consequence of such a short-sighted approach.

When IT units don't take responsibility for assuring the quality of the IT infrastructure, surprisingly enough, the IT infrastructure, over time, becomes an unstable house of cards, with the IT unit pointing fingers everywhere else.

And yet breaking this 'vicious cycle of bad ideas and worse implementations' by wiping the slate clean is no easy task. Especially when the need for kludges isn't apparent until the software is in the process of being implemented. 'Generally it's too late to change course at that point.'

If your process -- whether its for development or procurement -- doesn't discover holes before it is too late to do anything but apply "temporary" workarounds, then your process is broken, and you need to fix it so you catch problems when you can more effectively address them.

If your process leaves those interim workarounds fixes in place once they are established without initiating and following through on a permanent resolution, then, again, your process is broken and needs fixed.

You don't fix the problems with your infrastructure that have resulted from your broken processes by "wiping the slate clean" on your infrastructure and starting over. You fix the problems by, first, improving your processes so your attempts to address the holes you've built into your infrastructure don't create two more holes for every one you fix, then by attacking the holes themselves.

If you try to through the whole thing out because its junk -- blaming the situation on the environment and the infrastructure without addressing your process -- then:

(a) you'll waste time redoing work that has already been done, and(b) you'll probably make just as many mistakes rebuilding the infrastructure from scratch as you made building it the first time, whether they are the same or different mistakes.

Magical thinking like "wipe the slate clean" doesn't fix problems. Problems are fixed by identifying them and attacking them directly.

I'm going to tackle some of the conceptual problems that are hinted at above, which is usually where the difficulties lie, usually in trying to use the wrong software and expecting to somehow "make everything better" if you just make it work "my way" - the true "Magical Thinking".

I tend to agree with your conclusions, "wipe the slate clean" is a drastic action. I disagree with some of the approach you use to arrive at them:

a.) Problems are solved by people being invested in solving them, not process. This requires the antithesis of "Units" - Ownership; Ownership in the company, Ownership of the mission, and a direct heart felt connection to the success of the company. Until you have staff, from the CEO down, that own problems, from the mess in the coffee room to server down time, you will have a "business house of cards" no matter how good the process. In fact, most of the time, fixing things involves re-writing and/or reconsidering process - usually starting with asking the question - "Do we really need that?"

b.) Sometimes you really do have a train wreck on your hands. If you have mastered a.) b follows almost effortlessly, because now, you can *talk* about this behemoth that is eating your company and everybody sees the discussion for what it is, not empire building or managerial fingerprinting.

when you run into a train wreck - assess your tech problem - is the fix easily found? Are your processes using the software at cross purposes? if so, which is cheaper to fix? No amount of bug fixing will repair using the wrong software. It won't even fix using the right software in the wrong way.

In the end, re-asses often, be frugal, not cheap, if it truly is a requirement to run your business, buy the most appropriate. If you've made the mistake of buying a Kenworth long hauler when you needed 3 old UPS trucks - admit it, sell it back, take your loss and get what you really need.

Problems are solved by people being invested in solving them, not process.

Both are, IMO, essential, which is while while I pointed at particular areas of process, my big picture message was about IT shops taking "responsibility for assuring the quality of the IT infrastructure."

Neglect of process is a symptom of people not being invested in solving problems that leads to bad results on its own, but even a good (nominal) process isn't going to work well if people aren't invested.

I prefer "responsibility"; "ownership" is, IMO, misapplied here. (Though, arguably, one of the reasons people do not take responsibility is because they don't, in fact, have ownership -- but ownership is a material relationship, and responsibility is the relevant attitude.)

But I think in substance we generally agree.

We do, I think the difference is that my experience has been fixing projects starting from a technical complaint to an outside organization and helping those in an IT/Technology organization dri

The problem, though, is not so much with the products that IT "has to deal with" as with the fact that IT departments either actively choose the penny-wise-but-pount-foolish course of action of applying band-aids rather than dealing with problems properly in the first place, or because -- when the decision is not theirs -- they simply fail to properly advise the units that are making decisions of the cost and consequence of such a short-sighted approach.

I've found the problem to almost always be the last thing listed. It's the contractor syndrome. "If you give me $1,000,000 now, I'll save your $500,000 a year for the rest of the time you'd have used that." Well, they think you are lying. They think that you wouldn't actually save the $500,000 a year, but would take the $1,000,000 this year and add it to your budget as a permanent line item, costing them $1,500,000 a year, rather than saving $500,000.

You can blame the IT director/manager/CIO/whatever for not being convincing enough, but there seems to be a pattern where people bid low then have massive overruns where the highest bidder would have been cheaper. As such, the people the IT person is talking to are often so jaded they don't trust anyone with price estimates.

When IT units don't take responsibility for assuring the quality of the IT infrastructure, surprisingly enough, the IT infrastructure, over time, becomes an unstable house of cards, with the IT unit pointing fingers everywhere else.

And when the IT units have the responsibility, but not the authority to fix things, what then? Most all places tie the hands of IT then complain when the solution isn't perfect.

I couldn't agree more, but that's very expensive and very very dangerous. Why? Two factors:
1. Rewriting means rethinking; most legacy code is functional and is usually rebuilt in OOP. Whenever you rethink how something works it tends to change the entire behavior to say nothing of all the new bugs you'll have to hunt down. You're customers will definitely notice this.

2. Scope creep!! Rebuilding it? Why not throw in all that cool functionality we've been talking about for the past 10 years but couldn't implement because the architecture couldn't handle it. You get the idea.

This the essence of technical debt [wikipedia.org]. Whether you're programming or deploying IT infrastructure, it's inescapable that sometimes you're going to have to include kludges to work around edge conditions, a vocal 1% of your users, or whatever. These kludges are eyesores, and fragile, but they're also as far as you could go with the time and budget you had.

Sometimes, accruing debt like this enhances your liquidity and ability to respond to change, so avoiding all kludges introduces other more obvious costs that slow you down and make you seem unresponsive to users or customers. But you can't just go on letting your debt grow all the time and not eventually come up technically bankrupt. Let it grow when you have to, but just as importantly make time to pay it down. A lot of this stuff can be paid down a little at a time, as you come across it a few months later. The pay-off if you're vigilant is that the next ridiculously urgent fix to that system can often be handled much more easily, without dipping down further... with patience and attention to maintaining this balance, you can reduce your technical debt and make the whole system hum.

The downside is that there isn't a quick fix when you find yourself deep in technical debt. You can't just spend all your time reducing it; your highest aspiration at that point should be maintaining the level of technical debt, rather than letting it grow, but it's generally been my experience that altering the curve of debt growth even a little can set you on the right path.

Long ago, at one customer, the desktops grew weak. So they hit upon the idea of using a Windows terminal server to prop up their aging desktops. By gum, that worked so well that they replaced their desktops with thin clients throughout over the next year. Now, a few years after that, their terminal server handles everything from Solidworks viewers to web browsers and it's sagging under the load... and out of warranty to boot. Time to get a new server... only the economy has hurt their business terribly and

Ran into a similar situation with an old client of mine, only with more "hilarity" - basically, they had a consultant tell them that they could have a dirt cheap server, a bunch of thin clients, and save money. Queue one Pentium 4-powered server with software mirrored PATA(!!!) drives serving as a terminal server for over 30 people. Queue one Pentium 4-powered server quickly succumbing to the heat and wear that was generated from such a load. Queue one confused customer that was wondering why the company I

Hey, I never claimed it made sense, only that's what the client is doing. We've quoted them the most barebones terminal server addition we're willing to support, but they're still holding out for... I dunno, a miracle? Positive cashflow? An ounce of sense?

From the original message we read that the "code was also written to interact with a completely different set of OS dependencies, problems, and libraries." This seems to imply that the IT organizations are allowing outside interests to dictate the rules of the game. If there were a stable set of operating system calls and libraries to rely on, then the software vendors would have an easier time maintaining software. I recognize that Linux changes, but the operating system calls work well and API is quite stable. I have used UNIX for a long time and I have compiled programs from 25 years ago under Linux. There have been some additions since then, but the basics of Linux work like the basics of UNIX from 25 years ago.

At present there are some applications available only on Windows and some only on Windows/Mac OSX. This might be difficult to change, but going along with someone's plan for computing which is based on continued obsolescence seems inappropriate. At least those who are more or less forced by software availability to use Windows should investigate Linux and negotiate with their vendors to supply Linux solutions.

Computers are hard to manage and hard to program. It is not helpful to undergo regular major overhauls in operating systems.

I've been saying exactly the same thing since about 1994---since I got into linux thing. Every program I wrote since just "works" without changes (granted, I don't write many gui apps; mostly data management stuff). My Windows counterparts (same corp, doing semi-related apps) have to release a "new version" every time.net is patched---or something along those lines. Your environment shouldn't make your things break or not work right.

The article seems to be about vendor-provided apps that were written for creaky old versions of Windows. Windows, as you probably know, has a whole slew of APIs, and some work better than others. Of course, MS keeps them all around (though not all in good working order) because of their famed backwards compatibility that supposedly lets people run all their old software from 1995 (even though in practice it doesn't work well).

However, "negotiating" with vendors to supply Linux solutions (as great as that

I recognize that Linux changes, but the operating system calls work well and API is quite stable. I have used UNIX for a long time and I have compiled programs from 25 years ago under Linux. There have been some additions since then, but the basics of Linux work like the basics of UNIX from 25 years ago.

I agree that open systems (and the *nix toolkit approach) can make a big difference when it comes to living with legacy code. But, as others have said elsewhere, process matters more than anything.

If you start over you may be the only one how know how it's works and then some day the PHB will want to mess with it / replace you with a min wage guy who may not know much about how things are setup and can make it to a big mess.

This happens in commercial software development, too. There's this belief (often held all the way up the management chain to the top) that software, even bad software, represents some kind of massive, utterly permanent investment that must never be thrown away and re-written.

I've worked with managers who would think nothing of throwing away a million dollar manufacturing machine to replace it because it's old, yet cling with all their might to ancient software code that represents a similar level of investment.

There's this belief (often held all the way up the management chain to the top) that software, even bad software, represents some kind of massive, utterly permanent investment that must never be thrown away and re-written.

I'm a consultant that gets to see into the IT world of over a hundred organisations, and I see one major failing that companies make that cost them later:

They fail to keep up with the times.

Nowhere is this more evident than the leviathan government organisations still running Novell and Lotus Notes, on barely patched versions of Windows, with the browser restricted to IE6. Every time I see that combination, the word "expensive" comes to mind.

companies that over-promise and assume linear growth, observe startup companies with great features and expect retooling their product to compete is a 20 minute job, and have a culture of worthless slags content to wallow in old code as a means to avoid having to learn anything new. reacting in IT to parabolic growth is difficult and patches are cheaper than extended downtime or the methodology to prevent it being instituted in a live system.

More than any other type, businesses are run by salesmen. These are people whose strongest attributes are the ability to build relationships, to communicate value, and a strong inclination to increase their personal wealth.

Increasingly, the stuff salesmen sell is based on complex technologies that, really, are beyond the reach of their comprehension. They kind of understand the products they sell, but really, they don't. If the world only had salesmen, there wouldn't be any sophisticated products.

Say hello to the engineer...a person who builds products. His strongest attributes are a desire to solve problems, a willingness to absorb the tedious but essential details needed to build a complex system, and a personality that derives gratification from doing so.

We now begin the business cycle. The salesman says, "Build me something I can sell."

The engineers says, "I will build you something that works well."

And therein begins a lifetime of the two, symbiotically, talking past each other. The engineer serves the salesman, and the salesman serves himself. But make no mistake about it: the salesman is in control.

For a salesman, QUALITY means it works well enough for him to sell more, and most importantly, to make more money for himself. For an engineer, QUALITY means it works reliably and efficiently. To be sure, QUALITY is an abstract and moving target that varies according to the eyes of the beholder. But to understand why we have the predicament described in this article, we need only understand the SIGNIFICANCE OF QUALITY TO A SALESMAN.

I would continue to expound, but then, most readers here need only reflect on their already frustrated pasts to understand the mechanics of this convenient but often vacuous relationship.

The current patchwork of duct tape and glue that works today is much better than the pie-in-the sky "lets build it from scratch" architecture that IT is pitching that will be late, over budget and eventually have its feature set scaled down until it's less functional than what you have today when it finally is delivered.

There is constant pressure to re-implement existing architecture. Most of the time, the people who want to do this do not have a clear understanding of the business process involved, don't realize that the existing frameworks represent years of bug fixes and are at least stable for that reason. They only think "Wow this sucks, a new one would HAVE to be better."

I'm not saying that you should never rebuild something from the ground up, but the scope of the project should be limited and the entire endeavor should be well documented and well understood from the beginning. And if the guy who's pushing for a rewrite can't demonstrate a deep and fundamental understanding of the business flows being automated, he should be taken out and shot (Or at least pummeled soundly.)

If you've got even a small or medium sized enterprise application (whatever that buzz word means) at a larger company (Boeing, for example), it might have its hooks into a dozen or more peer systems/hosts/databases/whatever. They are all 'owned' by different depatments, installed and upgraded over many years. Each on their own schedule and budget. When one group gets the funds together to address their legacy ball of duct tape and rubber bands, they roll the shiney new hardware in and install the spiffy new app. But everyone else is a few years away from affording new systems. And so the inter-system duct tape is simply re-wrapped.

The IT department tried selling everyone on architecture standardization. But due to the gradual pace of system upgrading, the plan was out of date before everyone got caught up to the old one. And today's 'standard' architecture wouldn't play nicely with what was state of the art a few years ago (thanks Microsoft). The whole architecture standard ploy is a salesman's pitch to get management locked into their system. Unless you've got a small enough shop that you can change out everyone's desktop and the entire contents of the server room over a holiday weekend (another salesman's wet dream), it ain't gonna work.

The solution is to bite the bullet and admit that your systems are always going to be duct-taped together. And then make a plan for maintaining the bits of duct tape. There's nothing wrong with some inter-system glue as long as you treat it with the same sort of respect and attention to detail that one would use for the individual applications.

Design your user interface as a standards complient web interface. Why is there any need to code a native client to a system at all? A web interface can easily and rapidly be tweaked as browsers evolve, tweaking or rewriting binary applications is much more involved.

While the technology environment can degrade over time due to poor processes, just as important is getting it right in the first place. In my experience, the IT team often lacks any architectural capability when the department is set up, typically because it's "too small" or their needs are "simple". As a result, nobody with the necessary qualifications/experience/mindset really thinks about the business' strategic requirements, they just go for "best practice". Unsurprisingly, "best practice" by itself sim

Seems to be the syndrome. Everybody used Intel, MS and other devices to save cash and swap out over the next few years.
Why buy a huge expensive box that would rot in place when you could buy a cheap unit and swap it out later.
Nobody understood really where tech was going. They went cheap so when they had better cash flow and market direction they could slot the future in at a fair price.
The good news is someone made the right calls or got consultants who did. They might have got big boxes and it scales

It doesn't look like "doing it right the first time" is an option here. RTFA. They're talking about vendor applications being crappy and crufty, and IT departments being required to support them. The IT department didn't pick the app, and isn't allowed to not support it. They can't switch to another app (usually apps like this have little or no competition, and they're probably locked-in anyway).

So there's really nothing they can do but complain as long as they're required to support some shitty application on the latest version of Windows, as these are the requirements set down by upper management.