Why bad scientific code beats code following "best practices"

I've been working, for more than a decade, in an environment dominated by people with a background in math or physics who often have sparse knowledge of "software engineering".

Invariably, the biggest messes are made by the minority of people who do define themselves as programmers. I will confess to having made at least a couple of large messes myself that are still not cleaned up. There were also a couple of other big messes where the code luckily went down the drain, meaning that the damage to my employer was limited to the money wasted on my own salary, without negative impact on the productivity of others.

I claim to have repented, mostly. I try rather hard to keep things boringly simple and I don't think I've done, in the last 5-6 years, something that causes a lot of people to look at me funny having spent the better part of the day dealing with the products of my misguided cleverness.

And I know a few programmers who have explicitly not repented. And people look at them funny and they think they're right and it's everyone else who is crazy.

In the meanwhile, people who "aren't" programmers but are more of a mathematician, physicist, algorithm developer, scientist, you name it commit sins mostly of the following kinds:

Insufficient reluctance to use libraries written by clever programmers, with overloaded operators and templates and stuff

This I can deal with, you see. I somehow rarely have a problem, if anyone wants me to help debug something, to figure out what these guys were trying to do. I mean in the software sense. Algorithmically maybe I don't get them fully. But what variable they want to pass to what function I usually know.

Not so with software engineers, whose sins fall into entirely different categories:

Lookup using dynamic structures from hell – dictionaries of names where the names are concatenated from various pieces at runtime, etc.

Dynamic loading and other grep-defeating techniques

A forest of near-identical names along the lines of DriverController, ControllerManager, DriverManager, ManagerController, controlDriver ad infinitum – all calling each other

Templates calling overloaded functions with declarations hopefully visible where the template is defined, maybe not

Decorators, metaclasses, code generation, etc. etc.

The result is that you don't know who calls what or why, debuggers are of moderate use at best, IDEs & grep die a slow, horrible death, etc. You literally have to give up on ever figuring this thing out before tears start flowing freely from your eyes.

Of course this is a gross caricature, not everybody is a sinner at all times, and, like, I'm principally a "programmer" rather than "scientist" and I sincerely believe to have a net positive productivity after all – but you get the idea.

Can scientific code benefit from better "software engineering"? Perhaps, but I wouldn't trust software engineers to deliver those benefits!

Simple-minded, care-free near-incompetence can be better than industrial-strength good intentions paving a superhighway to hell. The "real world" outside the computer is full of such examples.

Oh, and one really mean observation that I'm afraid is too true to be omitted: idleness is the source of much trouble. A scientist has his science to worry about so he doesn't have time to complexify the code needlessly. Many programmers have no real substance in their work – the job is trivial – so they have too much time on their hands, which they use to dwell on "API design" and thus monstrosities are born.

(In fact, when the job is far from trivial technically and/or socially, programmers' horrible training shifts their focus away from their immediate duty – is the goddamn thing actually working, nice to use, efficient/cheap, etc.? – and instead they declare themselves as responsible for nothing but the sacred APIs which they proceed to complexify beyond belief. Meanwhile, functionally the thing barely works.)

64 comments ↓

Reading code is torture. You will always hate the code you inherit. A software engineer is never happy with code he inherits. You never hear "This code is nice! A joy to work-on. Look how easy to understand and pleasant and clever it is." Have you ever once heard somebody praise code. It only happens when you have mentors and you look-up to them.

@Terry – It has happened a few times to me before, where I've looked at code inside libraries I've used and went "oh, that's so neat! I should take note". But it does happen less with co-workers, even very talented ones.

You're right that overengineered code is a worse problem than dumb code (though I don't think the split really aligns with that between scientists and software engineers). In my experience agile – not in the sense of complex methodologies but simply limiting work in progress, releasing often, and tying every code change to a specific user-facing feature – is an effective antidote. But for something that's intended as a framework for use by developers rather than a product for customers it's less clear how to do that.

Agree with Terry here. I am one of those people that complains but that's mostly because I favour clean, simple code. I teach programming for free and one of my exercises I do with students who've finished tasks is I step through their solution and gradually refactor their code to make it simpler and easy to follow. Then they can learn about the parts of the language that actually would save them time instead of falling down the dark road of OOP and over-abstracting their problem. On future tasks there's always much much less work for me.
Now the reason I've mentioned all this is because the code my students write after a few months is often better than that written by engineers with YEARS of experience. I've chatted extensively with some colleagues at work about software engineering and been sorely disappointed when I've had to work with their code.
Java, PHP and C++ have done a lot of damage to the way we think about our craft and while the gof's Design Patterns book continues to be hailed as the bible of enterprise software nothing will change.

@ Imm More than agile I'd say it's automated testing that has been the greatest boon recently. Engineers are lazy and if they need 4 lines before each test setting up dependencies they'll likely go and refactor the code before ever committing it. Testable code is generally (certainly not always) simple code because it creates an environment where it's too much of a pain to test otherwise. Heck, my first interest in Idris and fully dependent types was because I was sick of testing my arguments did not violate constraints (eg. not null, not less than zero).

Of course, design patterns have no place in a simple data-driven pipeline or in your numerical recipes-inspired PDE solver.
But the same people that write this sort of simple code are also not the ones that write the next Facebook or Google.
I am also not surprised that you're capable of debugging simple scientific code: it's simple code written by people that wouldn't even know how to code stuff that has you make a double take.

On the flip side, you're going to need sensible software engineering if you want to build a system that does a little more than just step through a couple of time and space for loops.

@Terry: I'm fine with most code, actually; my sense of aesthetics has become rather numb over the years. Most of "someone else's" code is just fine to me. When it's really really hard to follow is when it's not fine, not when it's "ugly" along any of the possible dimensions.

@Georg: Google is kinda more about PageRank than "design patterns". But what do I know, go ahead and write the next Google. (As to Facebook… erm… I'm not sure what to say…)

What you criticize are bad programmers…which is fine, they are as fun to critique as bad scientists. But as a scientist, you should know better than to extrapolate without further evidence or study.

One of my favorite programming blogs is from someone in the field of bioinformatics. Programming is valuable to all kinds of science, and some programming jobs may require more substantial engineering skill:

Early in my bioinformatics career, I gave a talk to my department. It was fairly basic stuff – how to identify genes in a genome sequence, standalone BLAST, annotation, data munging with Perl and so on. Come question time, a member of the audience raised her hand and said:

“It strikes me that what you’re doing is rather desperate. Wouldn’t you be better off doing some experiments?”

It was one of the few times in my life when my jaw literally dropped and swung uselessly on its hinges. Perhaps I should have realised there and then that I was in the wrong department and made a run for it. Instead, I persisted for years, surrounded by people who either couldn’t or wouldn’t “get it”.

Ultimately though, her breathtakingly-stupid question did make a great blog title.

TL;DR: Science pays poorly, and then discredits the entire field when it attracts mediocre talent.

Scientists, when they realize their projects need a full-time, non-researcher software position, tend to be cheap. When setting salaries, these labs/centers make two mistakes:

Mistake 1. They assume that experienced professionals are willing to take a substantial pay cut to "do science".

In fact, the opposite is true. You are asking someone to forego like-minded software people and instead work with peers who largely view software as plebian, as inferior to their science, and who therefore see/treat you "support staff" rather than as a "peer". So whereas labs assume the scientific setting is a reason to take a pay cut, it's actually a reason to demand higher pay.

Mistake 2. They assume they can offer an introductory rate and/or otherwise don't have to out-pay the industry rate for mid-career engineers.

Labs/centers who do not pay very competitive salaries typically recruit low-quality engineers: either fundamentally bad engineers, or relatively young engineers who could become competent, but only with mentorship (typically not available).

I thought the code was less the problem than the methodology though in science. The failure to track the code (where ever on the spectrum of coding sins it sits) in version control. So the hacks done to it since the run three days ago make the runs using it back then unrepeatable.

Code that's not intended to have a long life or is single purpose can be written fully acceptable in the Scientist way. I can live with that, but use version control – to me step wise changes that can be verified is the scientific approach to software.

The advantage on the scientist side is that they understand what the program is supposed to do. While on the software engineering they often have no clue what their programming effort is supposed to achieve.
The "professional" programmers too often don't even try to find out.

So having no idea what they are supposed to do they retreat into what they know, making APIs and frameworks. But no longer doing what they should and that is solving problems using computers.

That is why I also prefer software from a "bad" scientist over "best" practices engineers. Of course I really want code written by great scientists and engineers applying good sense to their code and who validate their ideas with experimental evidence.

Your tone is mocking, dismissive and not diplomatic at all. The author of the post you're referring to at least implies that a mutual understanding and a golden middle ground is possible. He tries to be constructive. You try very hard to be dismissive and thus bring nothing to a discussion — only a puberty-like raging and very rough generalizations.

Also, you managed to completely miss the point. Granted, the horrible image you portray the developers with does exist — and I've met quite a few such rock-stars in my career — but you use this PART of the developer community to justify your points.

"Oh, and one really mean observation that I'm afraid is too true to be omitted:"… let me finish that for you: generalizations are a plague to any discussion.

There are good practices in the software even if you're not a software engineer — a title you use rather contemptuously — and the original author was diplomatic enough to mention them.

A few exaples will suffice:

(1) You don't need to be 60-year old seasoned programmer who've seen it all to at least make your code report errors in a dedicated error log file, as opposed to [in the Java case] to System.err, no? That way, you can always check what went wrong with your program and tune it accordingly, even days later — imagine you got called to a meeting while you were testing, then the power went off, and boom, your console with logged errors is not there anymore; a problem that a persistent log (a single file) might immediately solve. This doesn't require PhD in computer science and it takes only a few minutes extra to code. I fail to see the problem here, besides the scientist in question being absolute novice in any programming.

(2) You don't need to be a hardcore engineer to have a few bare-bones shell scripts to test parts of the code you are relying on, right? No need to fancy-shmancy unit-testing frameworks, mocks and all sorts of boilerplate (and they are, granted, NOT necessary for many projects). I was helping a mathematician once who didn't want to use a 10-year old and mature C library for matrix computations, wrote his own 10+ monstrously wrong (and long) methods and then spent one full month trying to figure out why his genious top-level algorithm (which depended on these matrix computational methods) isn't working. It was only when I was asked to help him, that he actually realized (read: when I forced the point upon him after I triple-checked) that the code he was relying on, was wrong. Something that a very small home-grown test suite would've helped him with since day one. Does this require a genius? Hell no. It requires a questioning nature and that your nose doesn't point the ceiling — something I expect from scientists innately, but boy was I wrong for this one.

(3) If you don't know the serialization formats which are deemed as a must-have these days, StackOverflow is not all trolls; a well-formulated question usually yields good answers. So no, I am not willing to skip laughing at a scientist who needs to serialize 580,000 records with 10+ attributes each to XML, instead of JSON or CSV (which, by the way, are also cheaper to parse in terms of CPU usage) or hell, even better sometimes — sqlite3 / cdb (so he could then import/export from/to every known database in the world if the need arises).

I believe the wise programmers will always agree that scientists don't need to be expert software engineers; of course not! Many of them must operate on the edge of the unknown, and OF COURSE this means they can't write the most beautiful code — that's perfectly okay, read my lips, a developer says: "THIS IS PERFECTLY OKAY". There are people out there who agree with this (check Reddit comments of the original author's article there, if you don't believe me). That however, doesn't excuse totally rookie mistakes like the ones pointed by the original author, me, and others.

Hope that clears it up for you, and I hope you now understand how horrible your tone is, because I did my best to mildly replicate it.

Agree with Dimitar above. The moment I smell "Hey in my experience ", I quickly know the discussion will go into a dog-eat-tail-eat-dog whirlpool. The comment "Many programmers have no real substance in their work" is offensive and implies that if your work does not involve any semblance of computational or scientific work, then you are just faking it as a software engineer. Really? So the Research departments of Major corporations staffed with PhDs writing scientific research code who make up 10% of less of the workforce are the only one doing substantive work and rest of the folks are just taking free money home? My mind boggles at how the tone of the article sweeps the rest of the discipline of software engineering under a carpet. Just great reading. Thanks.

Erm… "Biology, bioinformatics, astronomy, physics, chemistry, medicine, etc – almost every scientists has to write code. And they aren’t good at it" says the article I reply to. This is supposedly not dismissive and not a generalization.

The article I'm replying to takes it for granted that "software engineers" on average write better code than "scientists". I think dialog, mutual understanding etc. must include being open to the option that this is just plain wrong. Perhaps scientific code might benefit from better logging or more pervasive use of source control, etc. But – "stringly typed" – is it necessarily that bad? etc. etc. – perhaps those scientists are not as bad at it after all.

It sounds like there are some best practices (for business) which aren't actually the best for the needs of scientific computing, and that there are people who will push them into places they aren't appropriate.

What you would hope to have is people who are aware of the best practices for business, and smart enough and flexible enough to figure out the best practices for science — and even more so, your particular branch of science — rather than just unthinkingly throwing the proverbial book at the problem. (Which also works poorly in actual businesses, for that mater.)

Also, you'd want someone who can explain and demonstrate the benefits of the practices to a rightly-skeptical constituency, and who can figure out how to apply them most effectively at minimum cost.

Good luck with that. It's hard enough to find that kind of person for business, where it's a smaller leap. :)

#1: I think you have to remember something here: nothing is perfect. Most software guys will complain about the codebase they own because it has flaws that are baked in, potentially design flaws that would be huge effort to fix. The complaining becomes unreasonable at a point, I agree. But if you're a programmer and can't find a way to compliment or at least constructively criticize a peer's code, you aren't very professional. A professional understands "not made here" is a flawed mindset.

I think the problem might be process. Everything I check-in gets code reviewed and vetted before it gets merged. I've cowboyed physics simulators together in unfamiliar languages and I've written enterprise front-end eCommerce code. The two are not incompatible. My buddy in Bioinformatics complains that very little is modularized (well) and everyone ends up inventing their own solution over and over. If there are areas of scientific computing proven to work, why not also commit some time/money to making it re-usable (and testable)?

"I've been working… in an environment dominated by people… [with] sparse knowledge of "software engineering"… Can scientific code benefit from better "software engineering"? Perhaps, but I wouldn't trust software engineers to deliver those benefits!"

Sounds a lot like

"I've been working in an environment dominated by people with sparse knowledge of "anatomy"… Can medicine benefit from better knowledge of "anatomy"? Perhaps, but I wouldn't trust a doctor to deliver those benefits!"

People with sparse knowledge of anything generally only know enough to be dangerous. You don't work with software engineers. You work with hobbyists and tinkerers. They will, on average, produce bad code. Kind of amazing that a scientist wouldn't account for this skewed sample set.

Software Engineers and Scientific Researchers have different goals. If SR code breaks/seg faults/slow only a handful of people are affected. In the world of SWE's, it could affect hundreds to thousands of fee paying users. SWE's have to consider performance, fault tolerance maintainability, extensability not to mention automation because they are providing services. The writer of the article should think on that before generalizing SWE's as 'idle'

I agree with yosefk. Software industry is terribly solipsistic and has lost sense of its purpose. It lends very little value to the businesses or scientific institutions it purports to serve and far too often only bring in too much needless complexity and yak shaving. The attack on the the article author with inanities such as "who therefore see/treat you as "support staff" rather than as a "peer"" — You are support staff. Deal with it. The anxiety generated by the article author's insightful comment only tells you the obvious; deep down software industry professionals know that much of their work is unjustified and unjustifiable and that were things to take their reasonable, sensible course, there'd much much culling of waste in this industry.

"The one great principle of the English law is to make business for itself. There is no other principle distinctly, certainly, and consistently maintained through all its narrow turnings. Viewed by this light it becomes a coherent scheme and not the monstrous maze the laity are apt to think it. Let them but once clearly perceive that its grand principle is to make business for itself at their expense, and surely they will cease to grumble".

The Software industry/professional are principally just the same nowadays.

can I just point out that the list of non-programmers' sins includes things like bugs and crashes, while the software engineers' list boils down to, "it's hard to understand all the abstraction the first time I look at it"?

i will say the worst code often comes from the self-proclaimed "ninjas". Part of the developer's path to enlightenment includes overcoming over engineering and learning to write simpler code.

i will also say that i've had to deal with more bugs and more undisciplined thinking and wasted more time in those 1000-line functions than in the ridiculous abstractions.

anyway… as with most things, the best approach lies somewhere between the two extremes

I work as a software engineer at a biotech company. My job is to integrate code that scientists write into a larger infrastructure.

One of the pain points we have is that many PhD scientists are often unwilling to learn software technologies, feeling that doing so would distract from 'the science'. While this is valid, some problems we've faced as an organization are:

* Lack of interest/understanding WRT using version control.
* Unwillingness to learn I/O concepts beyond CSV files.
* Extremely premature optimization, which turns out to be worse than naive implementations.
* Unit tests that run for 30 minutes. Unit tests that test the wrong thing. No tests at all.
* Data inlined into source code. As ASCII art.
* No understanding of Big-Oh concepts, leading to O(n^2) algorithms that run in O(n^5).
* Programs that have dozens of options, none of which should ever be adjusted from their defaults.

And on and on.

None of our programs are islands of functionality. They must all work together within a larger context. We have a modest cluster in which we expect our tools to run on perfectly during production. Quality is absolutely critical.

Over the years, our software team has progressively improved the situation, and many of our scientists have turned into quite good software developers. They can work with the larger picture in mind. By adopting software engineering disciplines, they have seen the quality of their work improve and their productivity improve. This creates a positive feedback loop. Improvement is slow but continual.

Software engineering isn't about creating umpteen layers of abstraction or complicated inheritance hierarchies. It is about creating robust softwares that function correctly together. It is about creating an environment that people can work within effectively. It is about creating foundations that useful things can rest upon.

I just don't get the problem. Most scientific code is open source/public domain so everyone is free to contribute and to enhance it. If Bozho thinks he needs to push it to the next level, he's free to do so ;)

This article seems to arrive at its conclusions based on two things. Ignorance and arrogance.

The ignorance comes from a lack of understanding regarding what software engineering actually is, and why it exists in the first place.

Software engineering is to programming as architecture is to construction.

Sure, you could hire 10 construction workers to build your building, and they're perfectly capable of doing so. Unfortunately, it probably won't look anything like what you wanted, and you'll run into issues further down the line that were never considered when construction was taking place.

Similarly, you can hire 10 programmers to code a project, and they're perfectly capable of doing so. Unfortunately, it probably won't turn out exactly the way you wanted, and you'll run into issues further down the line when you decide you want to make a few minor alterations.

A scientist may don the programmer hat, and can come up with something that "works" well enough for his/her particular experiment. Realistically, what they've done is waste everyone's time. Without putting the time into the architecture component of programming ("software engineering"), they've written a single-use (and probably sub-par) program that will have to be re-written over and over again for each minor alteration. Also, some poor research assistant will have to pour over line after line of terribly-written code to figure out how to make these alterations. Oh, and then they find a bug that was actually a problem since day 1. Now they have to patch the 35 extremely similar (and equally poorly-written) programs.

The arrogance is typical of those in the hard sciences, and is actually somewhat laughable. Scientists will always look at engineers as an inferior creature, just as mathematicians will always look down upon scientists as inferior creatures.

Unfortunately for mathematicians and scientists, without application, theory doesn't pay any bills. We do, after all, live in the real world.

I'd love to see a scientist write a driver to interface with their own instruments.

Software is a tool, and a software engineer's job is to make that tool perform its function, and leave it adaptable enough to be built upon should the need arrive later.

Just as there are poor scientists, there are also poor engineers. Both are, after all, human.

I'm not an "arrogant hard scientist", I'm a programmer. I probably know all or much of the shit you call "software engineering" so not that ignorant, either.

As to code in biology: haven't worked there, heard a lot of horror stories, not the thing I'm talking about, not the thing the article I mentioned talked about. It talked about open source scientific code. I'm talking about scientific or similar code written in an organization mostly producing code. Either way it's never as bad as "no VCS used". What happens when biologists write code for their own needs is not what the discussion is about.

I've got a MSc in computer science and am about to finish a PhD in engineering (marine robotics). What I see when I compare my code with that of my fellow PhD candidates, who only have an engineering background (poor, poor souls): my code is by far more generic, better documented, better structured, much, much faster than what my colleagues write, and, most important, doesn't take me longer to program than their code. Plus, I get tons of reusable functions out of it they don't have readily available, and instead of spending days on reinventing the wheel each time I need something more sophisticated than a single for-loop, I can concentrate on actually getting my research done.

I do commit most of what is listed as "sins" above (e.g. subdirectories), and I force resistant environments (e.g. Matlab) to do things an engineer would never demand (again, e.g. subdirectories). However, these are IMHO not sins, but basic rules of adhering to standards that make your software more versatile.

Yet, after trying to bridge the gap for so many years, I'm ready to declare defeat. Colleagues admit that my stuff is great and then go back to their junkyard of unsorted code fragments that needs to be kicked into the bin each time some intrinsic detail changes. In reference to the original post: simple-minded? Yes. Care-free? Yes. Near-incompetence? Yes. But better? Well, maybe in the sense of a smaller number of characters per file, but not to any other criterion, no siree!

What I read from your 2 comments is — "I've been hurt by Bozho's article, particularly when he said X or Y". Well, that's perfectly human and normal. But the way you choose to react to it gives away your real character.

I am not gonna act like your personal guru here. But I believe that instead of giving up on a compromise by being a flamer in your post, you probably should've tried to bring the two worlds together — because these two worlds try to intermingle every day. Let's make it a mutual-respecting friendship, not a war.

Many people, not scientists and not software engineers, have said in the past that a big part of the scientists don't try to evolve their discipline (math, physics, etc.) as much as they strive to get publicized at all costs (Michael Crichton was one of the people observing that effect). This is *not* constructive. And it is certainly valid for many so-called programmers as well.

The way evolution works on intellectual level is: push forward towards a common goal, against all of your animal instincts telling you to rip your "opponent" apart — who, when you look at it beyond emotions, is not an opponent at all.

It was the people who believed in Tesla and Einstein who discovered that their work is significant in everyone's everyday life, not all the cynical pricks calling them "lab rats", "white coats" and other derogatory names similar to those you used on several occasions.

Exhibit A for the case that scientists and engineers are terrible programmers are the "Numerical Recipes in C/Fortran/Pascal" books. Those books have stood out in my mind for nearly 20 years because of their terrible code.

@AHaeusler: I didn't say your code wasn't better than the code of those other guys… I said what my experience was around people who're probably more competent programmers than those guys but still far from "software engineers" in terms of programming sophistication.

@Dimitar: erm… why would I be "hurt" exactly? I just said my experience lead me to a different viewpoint. It is you who're talking about "war", my character etc. etc. "I'm not going to act as your personal guru" but… I think that, regardless of being wholly inadequate, your reaction betrays a need for a thicker skin – a very useful thing on the Internet.

@Dean: Numerical Recipes would be my Exhibit A to support my point, actually! A great book with great code accomplishing great things. Compare that with Design Patterns – a book full of code doing nothing that you nonetheless are supposed to have read and memorized.

i work with math models for flight simularors. these models are hundreds of thousands of lines long and it is a nightmare to maintain bc of bad programming. this author talks about spaghetti code being easy to untangle in comparison to code written by software engineers with too much time on their hands. Wrong. The code he works with must be very small and relatively simple in comparison. Try doing anything to 100,000 lines of FORTRASH that exclusively uses common blocks, implicit variable declarations, and gotos. Experience the joy of following a variable that represents velocity in the inertial frame in one routine, the body axis in another, and arbitrarily changes units. Then tell me how easy the code is to maintain. The point is that good design was developed for a reason. Do people misuse it? Yes,but the good developers are practical and it sounds like the author is a lost cause.

The original article I was replying to was talking about open-source scientific libraries. I was talking about the related phenomenon of people working in organizations producing mostly code. Neither "bad" scientific code is as bad as what you deal with. What you deal with should be compared to 100000 lines of COBOL written by "software engineers" in a bank.

Bad code is bad code, whether or not it's written by scientists or programmers.

I've been working with scientists for two decades, mostly physicists, biologists, and chemists. They tend to write bad code.

It meets all the criteria outlined in the article and rarely is robust against changes. The latter is a big problem for research code. Small changes lead to instabilities and the need to refactor or rewrite (which really begs the question of whether it's even valid science). This, in turn, leads to large amounts of research dollars being wasted by researchers writing software rather than performing science.

To address this, they often bring in "software engineers" who are usually (a) students from the CS department or (b) contractors with a relationship with the university. Rarely are either of these two classes of developers good programmers. The former because they're at the beginning of their careers and the latter because they're usually after the 9-5 nature of university work.

HOWEVER, when a researcher can work with a real software engineer, excellent software is possible. Rather than defending scientists (who should be doing science) and disparaging the mediocre developers they tend to hire, we should work towards building more opportunities to create environments that attract good developers.

Almost all people strongly disagreeing with me based on their own experience are talking about settings where a scientist works in his lab, maybe hires a developer, and the code is an internal thing. The article I replied to talked about open-source scientific code which I believe to be a bit better than what comes out from the above-mentioned setting. I was talking about code going into production which is also probably better. Bad maybe but not that bad.

Also, obviously a great programmer can help a scientist who's very bad at programming create excellent software. But equally obviously, a scientist better trained in programming will do better, etc. I'm not talking about hypothetical worlds created by "us" (who are "we"?) pursuing some path towards collective improvement, only about what sort of bad code I fear to get stuck with the most in the imperfect world of today.

You also explicitly leave out bioinformatics, which is the particular area of scientific programming I have experience in. But I think the quality of most open-sourced bioinformatics applications isn't any better than scientist-in-the-lab. In a way, this is a good thing – bioinformaticists are happy to share their code even if it isn't perfect – but after several years of amateur maintenance, you end up with a boondoggle of spaghetti and have to state, like UCSF Chimera does, that "Compiling requires building over 40 third-party packages and is not recommended. See below for the problems you will face."

@Ben Fulton: LLVM is a project written by the strongest programmers on the planet, easily at the top 2% if I had to pull a number quantifying how good they are. Building llvm-gcc was for years something LLVM docs said "was not for the faint of heart", "elite gcc hackers" etc. If you look at LLVM's build system today, you'll see abundant use of the industry-standard autocrap tools which use sh, m4, make, automake, autoconf and who knows what else (what scientist on Earth could have created THAT?), together with a custom build system, which builds itself and then your code and whose error messages once misconfigured can be diplomatically called "unhelpful", and that process involves Python and a custom syntax and there's a tool called tblgen with yet another custom syntax. LLVM builds nicely out of the box, yes, but try adding a target and much of your hair will get pulled out rather quickly. Or try building gdb with tui support on Windows. Generally the perspective of building code off the net makes my heart sink.

I'm not saying that the average piece of scientific code is great in any way, I'm only doubting that people defining themselves as programmers do better on average than people defining themselves as scientists.

Maybe the thing is in programming languge itself as still be as imperfect thing to improve? Make it suite more for people and to the concrete task, rather than machine. (btw, i like to put grated cheese in spagetti instead of butter)

I've had this observation as well. Good programmers want to work on hard problems, and have the credibility to get that kind of work. Since they're working on genuinely hard stuff, and will often be the first maintainers of their own code (since they're building something from scratch and responsible for making sure it launches) they are careful to focus on the intrinsic complexity of the problem.

Second-rate engineers, on the other hand, tend to get tossed the easy but annoying work of filling out parochial requirements that come from the business. It's not very interesting and it doesn't do much for the CV, and the only reward for doing it well is getting assigned more grunt work, so they end up overengineering in the hope of getting some CV cred. Since the code is often inscrutable, it ends up playing to their political advantage as well. Even if people end up disliking it, it's hard to criticize that kind of code without the risk of being called incompetent. (To people unfamiliar with software, "It's too complex" sounds like "I'm too stupid." Of course, writing unmaintainable code isn't difficult at all.

I've seen my share of "professional" code and code written by my professors and other academics. I will only say that big code physics are not the same as little code physics. Professional programmers tend to bring in the heavy tools too often for projects that don't really benefit from them, but I also think an academic programmer that would write more than a couple thousands lines of code using the naive approach would quickly realize he needs to hire a programmer.

Professional programmers can indeed make things better, but they can also make things much worse, and it's not easy to ensure a good outcome through, erm, managerial oversight. I think if the physicist has a programmer friend known for his pragmatic mindset, maybe that'd be the best kind of outcome.

I have to say that I saw more people who were mainly physicists/mathematicians who ended up programming very nicely by observing the outcomes of the various approaches in practice, than people who were mainly programmers and eventually overcame their fear of the mathy subject matter and their desire to hide from their fear behind over-engineered infrastructure.

I would recommend anyone who thinks scientific code is "better" should read "The Hockey Stick Illusion" by A. W. Montford. While it is principally about the statistics behind climate change, a good deal of it involves the software flaws that led to incorrect conclusions that altered the thinking of the world about the subject.

For computational systems, which scientific code is a kind of, something like Cilk will nuke bugs extremely reliably (my Cilk knockoff, checkedthreads, is freely available on github and I explained how it works on this blog.)

I've *sometimes* managed to write code that others have *enjoyed* working with — but it doesn't happen all the time, and it doesn't (generally) survive extended periods of maintenance. Writing code that is functional and effectively communicates what it does to a wide audience is *damn* hard unless the code itself is trivial. Indeed, persuading people that they don't want or need anything other than the most trivial solution is probably the hardest bit. :-)

I would say that the most optimal way is KISS. Neither group has a clue how to do it properly and I have plenty of examples looking at over 100 guys code of all kind of training in last 30 years.
The dark art of quality software architecture isn't easy to master.

I recently left a growing software company that was completely made up of scientists and zero software engineers.

I should say "formerly growing," because the consequences of bad software engineering had begun to catch up to them. Yes, there code solved a difficult problem more effectively than any competing problem, but they have proven to be completely unable to break out of their niche because the code base is an unmaintainable nightmare.

Almost no functions are thread-safe, as they nearly all use globals as inputs *and* outputs. Adding a new option or a new model requires either months of refactoring work to or if you don't want to use copy-paste (which is, of course, what the scientists preferred to do).

Now they have unfixable bugs, built-in leaks that will take at least a year, probably more to fix, and crippling scalability issues that require rewriting around 50K lines of code from scratch. Planned features drop to years behind schedules both because of poor planning and the nightmarishly messy code base.

They will never fix these problems because everyone who can fix these problems quits part way into the project.

Yeah, scientists can do neat things while ignoring good programming practice, but about the time the code base his a million lines or so, the reason "good practice" is called "good practice" catches up with you.