Posted
by
Soulskillon Wednesday August 08, 2012 @03:14PM
from the and-runs-it-badly dept.

whitroth tips a story at The Atlantic by James Kwak, who bemoans the poor quality of software underpinning so many important industries. He points out that while user-facing software is often well-polished, the code running supply chains, production lines, and financial markets is rarely so refined. From the article:
"The underlying problem here is that most software is not very good. Writing good software is hard. There are thousands of opportunities to make mistakes. More importantly, it's difficult if not impossible to anticipate all the situations that a software program will be faced with, especially when — as was the case for both UBS and Knight — it is interacting with other software programs that are not under your control. It's difficult to test software properly if you don't know all the use cases that it's going to have to support. There are solutions to these problems, but they are neither easy nor cheap. You need to start with very good, very motivated developers. You need to have development processes that are oriented toward quality, not some arbitrary measure of output."

I think a better question would be "Why does it NEED to be better?" and the answer in most cases is "it doesn't". Remember perfect is the enemy of good and often the difference between "good enough" and great can be truly insane levels of expense.

At the last shop I worked at before striking out on my own i had to rush home one day and dig my very first two gamer PCs, a Pentium 60MHz and a P100MHz out of the shed, why? because a guy came in the shop about to have a breakdown because the PC they used to control their custom lathe had gone tits up and they had a $60k order due by the end of the week and no columns? No contract. No when i was cloning the drive I asked him "if the thing will only run on ISA, and the company is OOB, why not replace it?' and found out a unit today that will do what that one does would cost upwards of $150k, that one was paid for, it works, its easy to use.

Now I'm sure that if any of the programmers here saw that software they'd either laugh or bawl like babies. We're talking a primitive GUI running on top of DOS 3 that lets you build the design from templates or build your own from various shapes, VERY rudimentary and frankly made Win 3.x look like Win 7, ancient stuff. But ya know what? It works and works well. it does that one task, day after day after day, very WYSIWYG, you could get someone up to speed on running the unit in maybe an hour.

So while i can see why many programmers might look at something like that and think half ass I can also see why it was made the way it was. i'm sure those designing that software they are bitching about in TFA did it that way because they designed it for a task, weren't thinking about someone plugging into it down the line, and just concentrated on that one task. I'm not saying one should design like that today, but usually when i hear programmers screaming about bad software its because they've found something old and creaky and are bitching because they are gonna have to support it. Welcome to the real world Chuck, where jobs often suck.

And just to prove how fussy programmers can be I'll end this with something that always ends up getting programmers to gnash their teeth...I LIKE VB6, okay? If all you need is to have a GUI to a local DB frankly there is NO tool that can compare to VB6. Its fast, uses practically zip when it comes to resources, easy to whip off prototypes right there if you need to so the customer can see what fields he/she needs, for that one task and one task only, putting a GUI to a local DB even now you just can't beat VB6. That is why to this very day its like the third most popular business language, because businesses need that job often and it does it VERY well.

The bottomline is that there is a tool best fitting for every task, there is no magic bullet.Even quick'n'dirty code is the perfect choice for some things, things that do just one thing, like a database backup which needs more than a mysqldump cronjob (ie. send e-mail notification if backup failed)

Too often programmers fail for wanting to make things too fancy, forgetting the golden rule of code: Keep It Simple, Stupid!

For example, Zend Framework used to be quite great couple years back, but already sufferi

Awww...what the hell grasshopper, old Hairy will show you the way. What you wanna do is go start talking to all your local mom & pop PC shops, let them know what kind of work you do and tell them you'll give them a referral fee for every one they send you way.

Ya see we old PC shop guys end up getting this kind of work because we are all little pack rats at heart and HATE throwing away working gear, and once you've done a few of these "miracles" it really don't take long for word of mouth to spread. Now since it don't sound like you want to get into the wonderful world of Windows PC repair, which is actually a damned nice way to meet girls BTW, then what you are gonna have to do is get to be buds with the PC shop guys.

While there are some like me that have old engineer buds that can do any chip changing and TTL stuff i can't there is just as many that are straight 'fixit guys' that can't do the soldering, replacing chips, fixing burnt boards, and like I said we hate throwing stuff away and telling somebody "we can't do it". hell that was one of the reasons i got to be buds with an engineer, I can't do the solder stuff anymore as my hands aren't steady enough and he hates working on computers so we just swap it out.

So go talk to the little shops, be prepared to BS awhile, have yourself some cards made up to hand out, and before long you'll be ass deep in busted gear. remember that most places won't work on this stuff so a little word of mouth goes a long way, a little ad or two can't hurt either. Trust old Hairy that there is plenty of old gear that needs fixing and as long as you charge a reasonable price (remember starting out you need the business, don't go nuts. Once you have the buzz the price can go up) and it won't take long for that phone to start ringing.

That depends on your definition of average, mathematically speaking that's not true. What percent of numbers are below average in this set: {1, 1, 1, 1, 1000}

This isn't pedantry, this is a meaningful distinction: I expect the amount of good software is extremely outnumbered by the bad, and even good software developers can be forced into kludges by time pressures, bad team culture, etc. I don't see any reason to think that code quality globally resembles a normal curve.

That depends on your definition of average, mathematically speaking that's not true. What percent of numbers are below average in this set: {1, 1, 1, 1, 1000}

This isn't pedantry, this is a meaningful distinction: I expect the amount of good software is extremely outnumbered by the bad, and even good software developers can be forced into kludges by time pressures, bad team culture, etc. I don't see any reason to think that code quality globally resembles a normal curve.

Indeed. I would actually think that global software quality resembles an exponential curve with x representing how craptacularly bad the software is. Very few systems are near the origin (and thus are well written.)

It depends on your definition of "average" AND on the underlying distribution. If the distribution of software quality is symmetric then 50% of values are below the mean, median and mode (the three most common measures of central tendency or "average").

Yes, actually. You can measure it in terms of raw numbers of defects found. You can also determine the number of defects produced per some measure of effort (e.g. 1000 man-hours.)

You need quality control, which is about ensuring good practices that are documented, repeatable, and measurable, and quality assurance, in which the results of your process are analyzed for their overall quality.

Good QC should make QA's job a lot easier. (Or harder, depending on how you look at it.)

Raw defects doesn't indicate quality. A defect by which the system occasionally has to stop and replay some data write-out because of some hoakey disk driver is not a gerat problem: the disk driver is buggy, and is using a shitty hack to fix it. By contrast, a much better written driver with a very corner case race condition that 1/100 as often simply destroys a ton of data has a huge problem.

Linux is like that. If a hard disk drive starts to not respond, it'll send it a reset command and continue. It'll mount the filesystem read-only without special options; in some conditions that's important, because the OS view of the FS might be completely different due to undetected write failures. In any case, it's still up and you can get information out of the kernel. I've had the system hose itself so bad I couldn't actually read the logs or run dmesg, but if your boot process copies a few utilities into a ramdisk and sets tty1-5=login tty6='chroot/recovery login' you should be able to switch to that tty and run. Bonus points for statically linking chroot on boot (i.e. the boot process copies everything in from installed fs, then uses ld to statically link chroot to all its dependencies), so in a barely-functional active ssh session you can '/recovery/bin/chroot/recovery/bin/sh'

A high-quality system that fails 1/10000 of the time and destroys everything is worse than a low-quality system that fails 1/100 of the time without cause for concern. Yet the low-quality system is clearly shitty.

Not only that, but you've identified another problem with judging quality: software usually does not stand on its own; it's part of a larger system. What if a piece of software is well-written, but the libraries it links to are shit? A programmer may not have much choice if he's required to use system libraries, or some special vendor-provided libraries. He can add in workarounds for some of the bugs in the libraries, but that's it.

All defects are not created equal. A defect that is known that hits in a 1 in a billion UI interface offchance that just throws up an error, and nothing more is a defect, but 99.9% of people would not fix it, if it requires significant work. Why would they - it's a waste of time. A defect that drops all keyboard interface onto the internet in a secure system every time is still 1 defect - it's just a bigger one.

When it comes to software, yes. We have had it for decades: quality entails

architecture, design and implementation decisions that minimize the cost of change,

that does not deteriorate with each change (or at least deteriorates linearly with each change),

that exhibits strong cohesion and lose coupling,

that permits reasonable maintainability and configurability,

with relative small bug count per whatever metric one picks (FP, SLOCs, etc.)

that is amenable for testing

with architecture and design that are understandable (tied with #1)

Just to mention a few. There might not be a single universal definition of software quality, but there are common desirable attributes that have been known for decades. It's when people code without knowing these attributes (or not giving a shit about them) that we get the crappy stuff we get.

Software doesn't have to be perfect. It simply needs to be economical due to its qualities (again, qualities well known for decades.)

Where any specific bit of the enterprise fits in the revenue generating chain is an arbitrary organizational decision. Ultimately the entire entity is there to give the sales force something to sell and the ability to accept purchases and support customers.

In other words, the pointy end of the stick is useless without the rest of the stick. It's just prick lying on the forest floor under a pile of bear

That is because corporate infrastructure software does not generate revenue. Why spend money that does not directly impact the bottom line?

It does impact the bottom line; it's just harder to see and measure. When lots of employees are wasting time rebooting after crashes, or repeatedly navigating a slow and/or suboptimal user interface, that's wasted time that costs productivity and money. Just because you aren't measuring it doesn't mean it isn't happening.

That is because corporate infrastructure software does not generate revenue.

Says who? That's like saying a cleaning crew or the electric system does not generate revenue for a supermarket. Revenue is not just a function of what you sell. You need a lot of things to generate revenue. A business that uses corporate infrastructure uses it to generate revenue. The problem in most unorganized enterprises is that they have poor accounting for tracking the cost and revenue of corporate infrastructure. Added to that is that most software developers have no clue about the cost and ROI of th

Another rule here is out-of-sight-out-of-mind: If management can't actually see the effects of what's going on, they don't care how good it is, which is why UIs can be fantastic while the backend completely sucks.

True, most software is badly written and there are entire jobs dedicated to maintaining legacy and even current systems.
Some software is so badly written that it requires a team to prop it during peak usage times or War Rooms to determine fixes.
Managers usually only care about meeting a deadline and push for that. Young guys don't care about if something is correctly written - just that "it works" in that instance in time.
Being a good developer requires being enabled to be a good developer by your team.

There aren't enough 'good' coders in the world to implement all the software that needs to be written, let along 'very good' ones. Not to mention good architects, designers, requirements analysts, etc, etc, etc. And even if there were, software that needs to work together isn't always designed to do so. Hacks, cludges, and jerry rigged solutions are what hold the tech world together, no amount of wishful thinking is going to change that.

The implication of your statement is that companies need to sometimes say "this solution would save me money in 5 years vs this solution which costs half as much". There are times, quite possibly the majority of times, where the 'correct' decision is the cheaper one, and what you'll end up with is a world full of 'bad' software. What if the company simply doesn't have the money or time to invest in what you would consider the 'right' solution? What if implementing it the 'right' way is going to make you

That's not really the work of a good coder. Anyone could get lucky, and no-one writes correct code all the time.

A good coder though can structure code in such a way that problems do not cascade, that incoming issues are limited in scope in terms of affecting the rest of the codebase. A good coder can make a huge system where you can replace a part of it without magic or too many tears.

"James Kwak.. points out that while user-facing software is often well-polished, the code running supply chains, production lines, and financial markets is rarely so refined"

I disagree, while the GUI may be well polished, the underlying code is of poor quality, as it has most probably written by some contractor on an hourly rate. Quality control works like this. If it compiles, ship it and fix all the bugs in the next version...

and this article is absolutely correct. Forthe most part, we do regression testing, but a lot of code (a whole lot) is never unit tested, its not written to be used it tested, and there are configuration holes all over the place.
Each time there is a Jerome Kerviel or Nick Leeson, a generation of auditors will come through and find systems faults, and put in reasonably effective controls, but that is not the same as programmatic correctness.
Programmatic correctness often has to be baked into the code from the start (same with effective unit testing), and by and large, this is not an investment banks highest priority (as an earlier poster wrote, code that is not directly involved in revenue generation does not get funded).

Measuring how often something *doesn't* crash is extremely hard to show to the bean-counters. So, be ready to demo.

True story: When I started a sysadmin job, we had JBODs that routinely ran out of space, causing all sorts of downtime problems, like leaving the whole building dead in the water for days. I convinced TPTB that we needed decent storage (in this case, NetApp), but many of them choked when they saw both the purchase and maintenance costs. After we had it installed, I saw that a volume was close to hitting the wall. Before doing anything, I called in the most skeptical of the PHBs, and said, "Wanna see the NetApp pay for itself?"

I showed him the alarm that indicated a space problem, and told him we were on the verge of going down for two days. I waited for his skin to turn pale, and at the right moment, I said, "But we won't, because the NetApp can do this," and in a few keystrokes and mouse clicks, I added enough space to the volume.

I then told him that things like this happen once or twice a week, and I fix them without anyone knowing anything went wrong, but that I documented them in the trouble ticketing system I had installed, so if anyone wanted to know how many disasters were prevented by having decent storage, that's where they should look.

That didn't completely solve the problem of the thousand successes going unnoticed and the one failure never forgotten, but it came close.

It isn't cost effective to build good software... for a few users. I develop some internal systems. They are very complex and each of them have 40 users at most. The ROI of Apple polishing every tiny bit of a software is great. If each of their 100000 users spend one second less, it is a ROI of more than one day.
Human beings are very intelligent. They can learn to play a musical instrument, drive a car, operate a machine and to use shitty software.

I wouldn't say it's all bad software, I'm sure a lot of it is, but some of it is purpose driven software that has been repurposed as if it were off the shelf software. Dev houses build a piece of software for a specific need for a specific customer, then that customer refers them to others and they all want the same thing. They don't rewrite the software to be off-the-shelf, they just repurpose what they have and shoehorn it in and make it work(well, it works with MS SQL, but this company uses Sybase, so l

... it is interacting with other software programs that are not under your control. It's difficult to test software properly if you don't know all the use cases that it's going to have to support...

You define the use cases it will support, and reject anything outside of those defined cases. If your software acts upon cases that it does not know how to handle, then it is your problem, and only your problem.

It doesn't matter to the client whether your software segfaulted or replied 'sorry Dave I'm afraid I can't do that'. Either way, it hasn't completed a use-case that it is meant to do. And that fact may well mean that a load of downstream activities happen differently, and you quite possibly have gained nothing by rejecting it.

Having worked under several hats, the latest being as a system architect, I can tell you exactly why this happens.

We start with some upper management who have this 'nifty idea' that they must have for the business. Ok, fine... now let's get the ball rolling!

First, you have the budgetary committee. Without any input what so ever from the technical groups that make up the technology and know what is or isn't possible, they work with vendors on a parachute budget for the project.

Secondly, with this locked in budget in hand, they introduce it to the system architects and project management. The project management are giving timetables saying 'we need this done by this time, no exceptions'. They then pass that timetable, as well as the budget, to the afore-mentioned system architect.

Introduce stroke-approaching WTF moment for the architect....

Third. The architect goes back to the project manager saying 'We can't build the specs for the money in the time allowed'. The manager goes 'oh, right'. They go to the budgetary committee and bring this up, and once they realize the bottom figure is wayyyy out in left field, they come back with 'that's impossible, we need this done, with the results of this, for the money you originally quoted us'. So... head back to the system architect...

Forth, the architect then, to un-bury himself from the absolute disaster sitting in his face, tell the project manager what will be required to minimally meet the ends. This generally requires a ton of over-seas consultants, paid to grind the wheel 24/7, at the lowest dollar, to get it to work on outdated hardware to meet the end core/cpu/memory requirements and still 'work'.

Fifth, the consultants are hired, you're lucky if they understand english sufficiently to understand the nuances of day to day communication. They also take shortcuts, because they either don't know the right way, don't want to spend time on the right way, or are told that doing the 'right way' is not time efficient for the cost. So now, we have crappy code being tossed in, usually undocumented.

Sixth, the dev work is slammed in, marginally tested, and quick-shotted through QA because the upper management are in a time crunch and don't have the time to deal with all that 'quality assurance nonsense'. So this work is now fast-tracked to production in a non-fully tested workflow.

Seventh, it's been live for a while, things break randomly without reason, but it's ok, a restart of the application always 'fixes' it. So what if you have to bounce the app every 2-3 weeks to free up that memory leak. It works, in the budget... well, maybe for a few hundred thousand more... but it's done, it provides the 'nifty feature' that the share-holders were promised by the upper management, and the things that don't work are being pushed on...

three guesses...

The management who pushed the idea? Nope.

The budgetary committee who gave the low-end figures out of their butt? Nope.

The project manager who gave the tight time-frame for the project without major input from the technical people? Nope.

I know... the IT professionals who are still at the company like the system architect, network team, dba team, san management team, and the security group who are left holding the bag of the big pile of steaming crap? Yup.

Soo... when things evidently break so bad to be noticed, and management are told to 'fix' it it will take more money, more time, and more hardware. Shock, awe, and bafflement is shown, bonuses/raises are crushed because the IT professionals obviously can't do their job right, and maybe a few heads roll because management have their golden parachute and are not held to blame for the project they initially started up.

That. Is why the 'stuff out there is just horrible'.

It's not any one thing, it's how business runs, because these yahoos frank

Bean counter and management do. They don't care how much the staff struggles with lousy software (e.g. Oracle server on Linux). They care about saving a few bucks, getting their bonuses, reorganizing to hide the bodies and moving on to the next job. Hence, there will always be a market for crappy software. Capitalism fails at the interface level. If the engineers and low level end users made the purchasing decisions, you can bet quality would improve in a hurry.

The way the article is written, it hints that low quality = more implementation flaws.

Let's not forget that software can have design flaws, too, and careful programming might still lead to low quality software.

In the case of Knight, the defects might not have even been a function of the software per se. I'm sure a good bit of probability and machine learning go into HFT; these algorithms may have been the source of the errors, and the flawed algorithms might not even be due to the software engineers.

I have used this program my entire career. For the last 20 years (since MSC bought PDA), it has not changed apart from the odd user-generated macro getting included in. The windowing interface has had the same bugs (e.g. scroll bars that are 1 pixel high) since I was a wee lad. Half the stuff in it that isn't used regularly doesn't work, never has, never will and yet it is the standard for FEM pre/post in aerospace. Staring at this broken-ass POS year after year has filled me with ennui.

>> It's difficult if not impossible to anticipate all the situations that a software program will be faced with, especially when — as was the case for both UBS and Knight — it is interacting with other software programs that are not under your control. It's difficult to test software properly if you don't know all the use cases that it's going to have to support.

That's why so many industries continue to use FTP (also FTPS and SFTP) to punt files over the wall when real-time response is not

why is this news to anyone? Software is -always- shipped full of issues to meet a PM's deadline in order to say "See!!! We got it done on time!" to justify their salary and existence at the company. "Ship it now and fix it after the fact" (if at all) has been the mantra of in-house and commercial software for 20+ years.

Most big software projects I've seen fail hard, like millions and 10's of wasted dollars hard. By comparison you just dont see that very often in big electrical/mechanical/civil projects, which can be equally complex (eg refineries, cruise liners etc).

There are software developers with all sorts of fancy titles - architects, analysts, engineers - and yet they cant get the code right. Usually the root cause is inadequete requirements spec and failure to manage the customers expectations but that's no excuse, there are usually numerous poeple employed in the project process specifically to get those parts right.

Software engineering is still playing catch up, in the sense that most developers and development companies I've seen still dont follow a formal enough process for it really to be called engineering. Usually it's a bunch of computer science graduates having a wild stab at it, and the good ones are closer to artisans than engineers.

Until the entire software industry gets off it's high horse and admits this to itself - and more importantly admits this to the customers, we are going to continue to be dissapointed with the quality.

There are ways to determine quality.
One pretty standard way is to count the number of bugs found during testing each phase of development (design, coding, unit test, product test, integration tests and after its in production).

There are ways to determine quality.
One pretty standard way is to count the number of bugs found during testing each phase of development (design, coding, unit test, product test, integration tests and after its in production).

Those can be valuable metrics, but finding and fixing a lot of bugs can't improve the innate quality of the item under development/test. In other words, you can't test quality into the product.

That behaviour always annoyed me, there's just no valid use case for 'rm -rf/'.

On the contrary, it's something I do often - I chroot into a directory which has read-only bin/lib mounted, to test software in, and the first and last command is always "rm -rf/".

What you have to realise is that the Nike options (like -f and --force in most commands) means you sign in blood that you know what the command is doing, have verified it three times, and that you know that any blame rests squarely on your shoulders alone.If you aren't willing to sign that, use -i instead.

AC hits it on the head. This is nothing but the age-old search for the perfect metric. Development processes ARE oriented towards quality - for arbitrary values of "quality". The problem is that quality software is like porn: you know it when you're looking at it, but you have no idea how it is exactly defined. Is it a lack of bugs? Sure, but that's definitely not the only aspect. Is it maintainability? Maybe - if the software needs to be around for the next 30 years. Is it readability? Dunno - machine code is pretty unreadable, yet there's quality machine code out there. Is it how long it took to develop, how flexible it is, how user friendly, how much power features are in it? Maybe, maybe, maybe.

Pick a metric - a boatload of metrics - and I will find you a large number of cases where the metric fails. Are we doomed? Kinda. Just like there's no silver bullet that solves all your development processes, there's no silver bullet when it comes to measuring the output of the process.

In the end, what people care about is "does it do what we need it to do?", and that's all that anyone is going to remember. Unless, of course, it's review time, and then the only thing that matters is "the metric".

Are you saying that the technicians who used a left arrow as a backspace on the Therac-25 were not "honest, competent people"? Bad software and hardware design can kill. See http://en.wikipedia.org/wiki/Therac-25 [wikipedia.org]

Several years ago, I was brought in by our local electrical utility to develop a configuration control system for engineering documentation. Specifically, substation engineering documentation. The problem was presented to me as one of engineers getting behind schedule working on as-built drawings. And working from the wrong drawing revisions. So, before accepting the project, I asked to speak to some of the other stakeholders in the design and build process. Specifically, the construction crews. The first sign of trouble: Engineering management looked at me kind of funny. It seems that engineering and construction didn't get along too well. Nevertheless, as an outsider, I figured I had an advantage.

After speaking to a few workers, I had a better picture. It seems that the relationship between these two groups had been poisoned for many years by (as far as I could tell) one individual in engineering management. The crews considered him to be a raging a**hole and wouldn't work with him (he had retired years before, but people still carried a grudge). As a result, the crews pretty much built things the way they wanted instead of per the drawings. Specifically, they built new stations the way they built the last one. As a result, the smarter engineers as-built the older drawings. Those being the ones the crews were working to.

Management wanted me to come in and build a system to 'lock down' released drawings and restrict the work flow. Without understanding the root causes of their problem and how people were working to accommodate process and personnel problems. I walked away from that job offer quickly.

I have a few stories (oddly about defense contractors) where software was built intentionally to be lousy. If the users' management had to employ a staff of a few hundred people to repeatedly sanitize database entries (rather than built error checking into the entry system) it meant they were a third or fourth level manager (with the commensurate salary) instead of first level with a couple of DB admins under them.

Good software is easy to write. Bad management or personnel problems are difficult to fix. An associate one told me that he could fix all of the problems in a $250 million dollar failing s/w project with one clip in a.45.

Single-payer, for instance, would be a bigger boon to entrepreneurship, job mobility, and (actual) small businesses than any tax cut proposal the right (or left, for that matter) has put forth. Even better, if it's done right (so, at least as well as it is in the very worst system of that sort in the industrialized world, since they all spend less than we do) it'd cut costs for everyone, including big businesses.

I'm going to step in here as a pizza connoisseur. You are correct about how to order a Papa John's pizza (although you neglect to mention that the reason for going thin crust is more the hideous amounts of sugar they dump in their regular crust than its thickness), but it's definitely not excellent. Passable, if nobody else near you delivers and you don't feel like going out, but far from excellent.

The sauce isn't great, their thin crust is way too crunchy (unless it's soaked in grease and falling apart lik

If you are just measuring raw activity, the measure to use is CHANGED lines of code. A deleted line, added line, or modified line all score the same. If reduction in SLOC is interpreted as negative productivity, then refactoring is impossible. And your company deserves their fate!