The broken promise of static typing

I was quite surprised at a recent blog post by Uncle Bob Martin, titled: "Type Wars", in which he writes: "Therefore, I predict, that as Test Driven Development becomes ever more accepted as a necessary professional discipline, dynamic languages will become the preferred languages. The Smalltalkers will, eventually, win."

This statement didn't sit well with some people ​in the static typing community, who argued that in a sufficiently advanced statically typed language, types are proofs and they make unit tests mostly redundant​. Haskell even claims that "once your code compiles it usually works"​!

Be it safer refactoring, better documentation, more accurate IDE support or easier to understand, for me all these claims translate to a simple promise: less bugs.

And I really hate bugs. I find them to be one of the worst wastes of time and energy for a project, and there is nothing that annoys me more than getting to the end of iteration demo and the team being somehow proud of saying, "We did X story points and we fixed 20 bugs! Hurray!"

To me it sounds like, "In the last iteration we wrote more than 20 bugs, but our clients were able to find just 20! And we were paid for both writing and fixing them! Hurray!"

Chartin​​​g bugs​​

With that in mind, I tried to find some empirical evidence that static types do actually help avoid bugs. Unfortunately the best source that I found suggests that I am out of luck, so I had to settle for a more naïve approach: searching Github.

The following are some charts that compare the "bug density" for different languages. By bug density I mean the average number of issues labelled "bug" per repository in GitHub. I also tried removing some noise by just using repositories with some stars, on the assumption that repositories with no stars means that nobody is using them, so nobody will report bugs against them.

In green, in the "advanced" static typed languages corner: Haskell, Scala and F#.
In orange, in the "old and boring" static typed languages corner: Java, C++ and Go.
In red, in the dynamic typed language corner: JavaScript, Ruby, Python, Clojure and Erlang.

Round 1. Lang​​​​uages sorted by bug density. All repos

Round 2. Language​s​​ sorted by bug density. More than 10 stars repos

Round 3. Languages sorte​d by bug density. More than 100 stars repos

Whilst not conclusive, the lack of evidence in the charts that more advanced type languages are going to save us from writing bugs is very disturbing.

Static vs Dynamic is not th​​e issue

The charts show no evidence of static/dynamic typing making any difference, but they do show, at least in my humble opinion, a gap between languages that focus on simplicity versus ones that don't.

Both Rob Pike (Go creator) and Rich Hickey (Clojure creator) have very good talks about simplicity being a core part of their languages.

And that simplicity means that your application is going to be easier to understand, easier to change, easier to maintain, and more flexible. All of which means that you are going to write less bugs.

What characterizes a simple language? Listing the things in common between Go, Erlang and Clojure, we get:

No manual memory management

No mutex-based concurrency

No classes

No inheritance

No complex type system

No multiparadigm

Not a lot of syntax

Not academic

Maybe all those shiny things that we get in our languages are actually the sharp tools that we end up hurting ourselves with - creating bugs and wasting our time - and that all they do is bring a lot of additional complexity, when what we really need is a simpler language.

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies.

Great topic! I think your data analysis is fatally flawed, though. The only way to really figure this out is to give similarly experienced programmers a task and see how many errors solutions in the static typed languages contain vs the dynamically typed solutions. Unfortunately, I'm not aware any research on that exact topic.

I think Uncle Bob's oversimplified things. On the web, where speed is important and bugs usually aren't very costly, dynamic languages will win.

But in other domains where bugs can be expensive or cost lives (avionics, nuclear power plants, pace makers, etc), we might want to use languages and tools that help us ensure there are no bugs (or essentially no bugs). 100% test coverage of a dynamic language that Uncle Bob's talking about doesn't mean there are no bugs. And it certainly doesn't mean you've covered every execution path of the code.

If you look at the software written in Spark/Ada, you can see some really low defect rates. These defect rate are well below anything you could hope to achieve a dynamic language using TDD and there is data to back that up. But you end up trading speed for correctness.

I like how you go on to say that the analysis is "fatally flawed", but don't explain how. If the author did some kind of data manipulation that favored one language (or paradigm) over another, that would be a sign of being "fatally flawed", but it seems like there's no other explanation for the observed data other than the author's conclusion. Obviously, it doesn't meet scientific standards for being conclusive, however, why should we prefer an opposite statement (that static types are more bug-free) by default over the author's statement (that more simplistic languages are more bug-free)? Clearly, the data is in favor of the latter statement.

"100% test coverage of a dynamic language that Uncle Bob's talking about doesn't mean there are no bugs. And it certainly doesn't mean you've covered every execution path of the code."
This seems to be very far from what the author was talking about. I think you might have misunderstood the article.

Here are some alternate explanations of the data, all hypotheticals that should be considered before OPs conclusion is accepted as an accurate interpretation of his data:

Practitioners of different languages have different reporting habits: they call different things "bugs", they report with varying frequencies, they tend to not care about reporting bugs as much as building the next feature, etc.

Bugs are different sizes, so while haskell and python might both have "1 bug", the cost of that bug could vary wildly.

There's a ratio of "bugs per feature", so more productive languages show up as more buggy.

Bugs are labeled differently, IE perhaps haskell projects tend to have nice "bug" labels just because static typists are more OCD about it, where as a python project might have a million bugs, but no one labeled them as such. (related to my bullet #1)

I agree with others in the comments, in order to appropriately draw up causal relationships, one would need to construct an appropriate experiment. Double-blind-placebo-controlled-randomized might be a bit tough to construct, although the closer to that one could be, the better.

Perhaps one could construct a randomized crossover though, and that would finally lend some actionable insights into the problem?

19:22 minutes: defect rates of 5 projects that used 'correct by construction' software development techniques

32:33 minutes: productivity, cost, defects of the tokeneer project (zero critical failures found after extensive testing by the NSA).

33:51 minutes: The NSA gave interns with no experience with these techniques the job of adding features to the tokeneer project and they had amazing results (NSA conclusions at 36 minutes)

38:54 minutes: discussion of a few real-world safety-critical projects developed with these techniques (including defect rates which are fractions of the defect rates for typical projects)

I'm mostly a web guy but I'm really interested in this stuff. I've done a bunch of reading and I'm just beyond a "Hello World" example in Spark (the learning curve is pretty steep compared to picking up Java or something like that).

Anyway my inexperience with Spark/Ada prevents me from being able to tell how honest Thomas is about the benefits and drawbacks of this approach but I'm intrigued all the same.

I can relate to your Clojure experience. The Pragmatic Programmer (remember that book?) was right. Learning another language or paradigm effects how you program and how you think about solving problems.

Martyn Thomas has a whole series of lectures and they are all interesting. You might want to checkout:

Anyway, I really got interested in this stuff because I'm working in a code base that is full of bugs (who isn't, right?) and I just thought there has to be a better way to develop software so I started asking myself how 'they' make software for safety critical applications that doesn't break and isn't full of bugs.

The traditional advice is to turn up the compiler/interpreter warnings. Then you add static analysis. And now in PHP 7.1 you have optional strong typing so you convert your code base to run on PHP 7.1 and you do some of that. And you write unit tests. And once you're good at that you switch to TDD.

And all that stuff is good. It's really good in fact but it doesn't help you if you missed a requirement or a whole class of requirements. It also doesn't help if your requirements are ambiguous or contradictory.

So what we're trying to do is get really fast feedback. If we've got something wrong, we want to fix it as soon as possible because the longer that wrong thing is in your system the more it will cost to fix it. And the next step after everything I mentioned might be formal methods and mathematically verified software. I think of it as an uber-static analyzer in that it automatically verifies certain properties of your code (and annotations).

So you can spend your time writing tests and hoping you catch things or you can spend your time annotating your code in Spark/Ada and let the tools prove it works or you can ship buggy software, which in some cases is the right thing to do.

The real question for me is what if any of these tools and disciplines are appropriate for my role as a web developer?

Most projects spend more than 50% of their budgets testing and fixing defects. Could we spend a fraction of that money up front and do it right the first time by writing software with formal proofs? I don't know the answer yet but I'm working on it.

So its been a long time since I last coded in SPARK (nearly 10 years now) but it's worth noting a few things:

1) The really low defect rates reported for systems coded in SPARK aren't simply due to the language features, but also down to the "Correctness by Construction" approach, which emphasises getting things right from the high-level requirements all the way through formal specs and into coding, information flow analysis and proofs -- the sooner you find and eliminate the bugs, the less costly removing them is. The language greatly aids this approach due to its static analysis capabilities, but you can improve the defect rates in any language by following a similar approach (not going to get them as low though)

2) By getting rid of the bugs early, you are minimising re-work and (importantly) re-verification when removing them at a later date, so the "speed for correctness" tradeoff isn't as large as you might otherwise expect. Certainly in the domains where you tend to find SPARK (or normal Ada) being used, the cost of testing required for a similar confidence level in other languages can exceed that of the V&V for SPARK.

3) A lot of the applications that demand really low defect rates are aerospace, defence, etc etc. You'll see more statically typed languages in this arena because of their amenability to verification, but you are unlikely to see these projects pop up on github. That's an understandable limitation of the approach in the original post.

3) There's some good info on this set of slides from Rod Chapman of Praxis about real world applications, including defects per KLOC: asq509.org/ht/a/GetDocumentAction/... (NB: Praxis developed SPARK from earlier work from University of Southampton, and is now part of Altran)

4) Even proof of partial correctness doesn't negate the need for testing. Proof of freedom from run-time exceptions (e.g. demonstrably no buffer overruns) is less time consuming, but of great value.

Finally, I believe that Tony Hoare's quote was also used in the preface of "High Integrity Software: The SPARK Approach to Safety and Security" which is pretty much the text for SPARK :-)

This is interesting. I'm normally of the static typing camp, but I get it. It's one thing to write code that a computer can understand, and it's a whole other thing to write code that another person can understand. More easily understood code = less chance for bugs.

When I first started programming, it was in C++. I then learned Java in college, as well as VB.Net and C#. My first few programming jobs were .Net, and now I'm working in Python. Every once in a while I'll fire up Visual Studio at home, and do a little C#, and the difference between it and Python is quite interesting. I'm definitely not working on the same kind of projects between work (Python) and home (C#), but there's some things that I really wish were easier in .Net. For example, to make an HTTP web request in Python is maybe - maybe - half a dozen lines. The same thing in C# is probably 1.5x to 2x as much code. Same sort of deal for something like reading/writing a file.

That being said, there's some definite proverbial rat's nest code that I've seen in Python. All the manual type checking in the world won't save you from bugs if you aren't being smart about things.

I wonder how much of this is also the experience of the programmers. I don't really know any Go, Scala, Haskell, or Erlang programmers. Perhaps these languages attract a more mature programmer? I wonder if that Stack Overflow data would should a correlation between languages used and years spent programming...

I may be missing something, but I always thought that the value of static languages was in API and framework discoverability (aka strong autocomplete), not in avoiding bugs. When a variable type is known by the compiler, it can more easily figure out what you can do with it and avoid trips to the documentation. This is why I like static type system that doesn't get in the way (more C#, less Java).

Also, community plays a part. The languages with higher bug densities with some exceptions seem to attract more beginner programmers.

The impact of bugs for the users. Does a bug mean billions $ losts, hundred or thousand people killed (think aircraft autopilot or you nuclear power plant software) ?

The team/developers/company behind. Are they reliable, serious, experimented ?

What I see in your graph is that the most used languagues (C++/Java) with the biggest codebases and most features under the software built with them have the most bug per repo, but it seems logical.

Seeing that, it is now quite hard to draw any conclusion from that data alone.

What I surely see is that static typing serve as a mandatory documentation that help both the compiler, the IDE and the developper to reason about the code. There less information available on a typical dynamic language meaning that one has to rely more on alternate solutions but in state of the art tooling, the IDE/compiler typically never catch up. More checks are done at run time and the IDE fail to provide the same quality of tooling and context (auto completion, refactoring, code navigation).

The data is from Github which means Open Source code and from tens of thousands of repositories.
I state that the approach is very naive but I am still surprised about the results.

I agree that not all bugs are equal and you shouldn't use the same development practices in all projects.

I would include Ruby, Python and JavaScript in the list of most used languages. I do not which codebases are the biggest or with most features, but Steven McDonnell in the "Code Complete" book says: "the number of errors increases dramatically as project size increases, with very large projects having up to four times as many errors per line of code as small projects"

I think that is the reason why monoliths needs to be split into micro services at some point. My personal experience is that language expressiveness matters

Thanks for the link to Paul Graham’s Beating the Averages and The Blub Paradox. I run into that all the time.

Trying to convince the other developers (who are very bright people) that there are alternative languages that would be more powerful and suitable for our problem domain invariantly meets with deer-in-headlights blank stares.

Even contemplating alternative languages is outside of most developer’s comfort zone. Or moreso, even outside of capability of consideration. Even as a thought experiment.

When I look at the trends, I see object-oriented programming to continue for the foreseeable future. But, I also think there will be two language idioms that overtake object-oriented programming languages: functional programming languages, and domain specific languages.

The most visible correlation in your data is that the more stars a repo has, the more bug there is inside. Also the respective ranking of language change significantly with the number of stars, like java being quite good for all repo, but quite bad by your metric on big repos.

It may be possible there a correlation between language and number of bugs or dynamic/static typing but really the data is not refined enough to remove other variables so concluding anything is impossible from the data.

Sure that language expressiveness matters, it is enough to try to develop anything in assembly vs Java or Lisp and you sure see a higher level language work better. But there expressive languages on both sides and different languages may suit different problem categories too.

My impression is also that huge projects are not often done in dynamically typed languages. I feel like a dynamically typed language may be able to leverage more of the individual productivity and on the contrary are not that great when the code base scale (millions lines of codes).

The number of line of code is not a good metric but it is far better than thinkings all repos are equals, so I would consider bug per LOC. After that is done you could always apply a factor between high level language vs basic one (like C typically needs more LOC than Java).

I neither do think that the data proves anything, I hope I made that clear in the post. Proving is a big word that I rarely use for anything.

I don't know if you noticed by I linked to the best source of studies on the matter that I found.

Reading your comments, something popped to my mind.

When we talk about huge projects, do you think that we plan from the beginning for huge projects or their start small and grow to be huge? Do you know think is common in the second case to switch languages?

About huge projects do we know in advance? Well I guess it is case by case.

Twitter started basically as a Ruby shop and decided quite some time ago already to migrate to the JVM with Java and scala in particular (and javascript for the client). I don't know but I would say twitter started small.

Now I have colleagues and friends working for the french civil aviation and they decided long ago to make a new version of one of their key component. They started thinking big from the start. And by the way, automatic memory management was a no go as not realtime friendly, meaning many language like Java/Clojure/Lisp are instand no go.

There a saying that if you are a startup, you should go for instant productivity and that you'll always have time and money to rewrite everything if you company is to be successful, but if you are not successful, going more slowly to ensure better architecture, easier to maintain code or better performance doesn't make sense at all.

Some other would say you should use what you master. I think that make a lot of sense it save you time and let you concentrate on more important aspects like finding clients, hiring the right people or creating a business plan...

Most of the companies I worked for are big established company and while there often an emphasis on using the best tool for the job, it is also quite important to use standard tools, ensure you can hire easily and also that new people to a project have a chance to get up to speed. They almost always choose the popular statically typed language with Java, C++ and C. Javascript is now widely used but only because there basically no way to avoid it on the web and for year such company tried many way to go around it: JSF, GWT, doing it all on the server... The dislike of javascript by many IT specialist practice made the web losing years before nice reactive websites because the norm.

Theses companies have technical policies and outside of proof of concept, for anything that may go to production, it has to use allowed technology. For my current company that's C/C++ for most legacy, Java for most new things, Scala/Spark for BigData analysis and a bit of python. That last one being restricted to scripting, small projects that do not need to scale.

I do not necessarily says it is the right way to proceed, but the common practice is to use a statically typed language that has widespread adoption in the industry, and a mature echosystem that help on the productivity.

That being said, I quite remember the arguments of Paul Graham about lisp and how it helped him on his startup.

But even if he criticized it, when Yahoo brought his company, one of the first things they did was to migrate the code from lisp to a statically typed language... The decision was criticized, maybe rightly so, but it show that many people are not that found on dynamically typed languages.

I always wonder what I would do if I created my own company. Would I mandate some popular language or would I allow every team choose whatever they wanted?

I can see a lot of good arguments in both sides and I have seen a lot of talks about the subject, and again, nobody agrees.

It is a little bit paradoxical what you say about the best tool for the job. I have similar experience and I see it as "the best tool within these limited and blessed toolset". When and how do you decide to add a new tool? It is really hard to quantify the value and cost, when we keep saying things like "more maintainable" or "easier to use".

Paul Graham essay is a classic, every developer should read it, not because of Lisp but to be aware of the Blub Paradox. It applies to all of us.

I guess for developers like you or me that love our craft, we want to get the most of our time, tooling and libraries. As such we like to have the best of the best, whatever it is.

That the promise of languages like lisp where you can easily build new abstractions that fit the best to solve the problem at hand.

But many things require several or many people either at the same time or over the years... maybe for example you'll not want to devote the next 20 year to the maintenance of the project you did in the past 5-10 years. This is where standardization make sense. If you get better productivity for yourself but the overall productivity drop, that a net loss.

So both aspects are to be taken into account. I would say in a big company, small independent team each completely responsible of its area even including the production make sense and help to scale that productivity. In today world, for many case, just saying your are able to provide VMs in the cloud that are able to respond to some kind of network queries should work and let of freedom in how things are done inside.

But even that doesn't solve everything. The interractions between teams will still dictate many things like what protocol data is exchanged. But also how you managed your database, what is overal architecture, what tools will you use for the continuous integration, QA testing, the cloud you'll use and how your application will skrink and scale dynamically...

There no much we can do alone in a big company if we don't cooperate.

I am convinced the language impact the productivity somewhat, but many other things impact it more. The programing language is a tactician choice, while the bigger things are strategist choices. And while you'll want to delegate the details to great tacticians, you'll want to have great strategists when you are in a big company... If just switching teams mean your employee need 6 months or 1 year before he become fluent in the technology stack, that's a real downside because this is only a small part of the job.

1) Github issues cannot be really regarded as issues because many of them are questions/enhancements etc., Most of the repos lack proper labeling of issues as "issues" as opposed to having nothing. So you have to manually sift through the issues to identify which is actually an issue

2) How many of these repos do actual development in github? I can see various compilers such as v8 js library sitting in a separate repo and only the mirrors are there in github. So there are no issues tracked in github.

3) I would prefer a complex language with simple features at its heart such as scala as opposed to simple language with complex features such as PHP. Because once you master the language (takes time), the number of bugs can significantly go down.

4) Lesser bugs does not mean a stable language. This is a classic case of correlation does not mean causation. In fact, it can mean the opposite. Languages such as java have a bigger community and hence it has more bugs because there are more people to test it.

5) Static typing was never meant to improve correctness of the programs. It evolved naturally from assembly language where you just allocate bytes. While dynamic typing claimed better productivity which has been proved to be false over the years

6) No matter how much sophisticated languages come out, they are always going to lose against human stupidity. They are no match. No language is going to help you from shooting yourself in the foot. It all depends on various levels of safety and choosing the right tool for the job

This argument is never ending because you can always find people on either side of the court.

1) issues labelled "bug". Of course is GitHub data, so you can trust it as much as you want.
2) I assume the majority of the tens of thousands of repos.
3) I prefer a simple language with simple features :). You claim: "the number of bugs can significantly go down", do you have a serious study to prove that?
4) Agree, but I expect the Scala/Haskell/F#/Clojure/Erlang/Go communities to be roughly similar in size.
5) Maybe it wasn't meant to, but it is now "common knowledge".
Productivity claims are yet another can of worms, from which I have never seen a proper study to prove anything. I see that you link to a StackOverflow question, but I don't see any study mentioned on the answers. Where is the prove?
6) Completely agree. Better not to give those idiots a gun to shoot themselves on the foot.

Do you think the argument will end if we had enough data? Would we be ever be able to have the data?

Nice read, but I no longer believe in a static vs dynamic discussions :)

I prefer a simple language with simple features :). You claim: "the number of bugs can significantly go down", do you have a serious study to prove that?

I do not have any proof. But I say that from my experience with PHP and Scala. For example, PHP does not have a O(1) hashmap. There are several other examples that I can give.
The fact is, a simple language does not give you all the tools that you need. You can develop any way you want it to. This is the problem now with JS ecosystem. There is lack of standard frameworks and tooling. Sometimes it is better to be opinionated.

I see that you link to a StackOverflow question, but I don't see any study mentioned on the answers. Where is the prove?

Since we are dealing with opinions and not lemmas, I dont think we can prove either of them. But what we can do is gather collective opinions. I believe stack overflow is a very mature community to discuss such issues.

Productivity claims are yet another can of worms, from which I have never seen a proper study to prove anything

I use a strategy called proof by contradiction. Dynamically typed languages claim to be better at programmer productivity but from my experience of coding, they never work out for large teams. Hence I do not see what they are good for. You can neither prove this nor disprove this.

Completely agree. Better not to give those idiots a gun to shoot themselves on the foot.

What I meant by this is no matter how much a good language you have, there is always the need for code reviews. Nothing beats that.

Do you think the argument will end if we had enough data? Would we be ever be able to have the data?

I dont think that is the question we need to be asking. We have had several years of discussion around this and there is nothing conclusive. So I would choose whatever language that works out well for the team. This is of course based on several factors such as community/library maturity, ease of learning etc.,

One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies.

I would go with the latter any day. When coming to languages/frameworks, people tend to be overly creative. Instead of creating their own, people can always contribute back to open source/existing stuff and there is a very strong relation between the maturity of the language and the library maturity. An example would be the JVM ecosystem vs ruby ones.

We are leaning towards the same definition of simple. But where we differ is how it is implemented in languages. You should take a look at martin odersky's talk working hard to keep it simple. A language being complex to learn is not necessarily the same as its usage in applications. There are some languages that get this right, such as scala.

What do you think about microservices? Isn't it a way of avoiding big complex codebases?

I don't think microservices is the only answer. Services can be split if they naturally have a boundary. In my experience if the application talks to the same database schema then it would be unnecessary overhead to create everything as a micro service i.e instead of database calls we would now be using REST APIs. HTTP while being a good protocol is definitely not a replacement for regular method calls within a language. I tell this because most of the microservices discussion end in a way that split everything as small as possible, which is definitely not the goal of microservices.

But it can definitely help to a great extend if done right.

Of course my experience is limited and I am not an expert in any way. Just my 2 cents.

What if Haskell, Scala, and F# developers are super proactive about reporting bugs? What about other labels? In the F# compiler and IDE tools repository, "regression" is also used. What about repositories which could naturally have a high bug count, but can't be measured the way you chose? Clojure, Ruby, and Scala compiler repos don't have issues on GitHub, for example. F# and Golang do. There are so many other questions surrounding methodology that I have.

"Strong static typing is often used as an excuse for not testing the code [...] The result of this cavalier attitude is that in several studies Haskell didn’t come as strongly ahead of the pack in code quality as one would expect."

So maybe strong static typing plus proper testing is the answer to less bugs.

I'm not suggesting that some groups are more proactive than others - just offering a question that one could draw from that data. Put differently, what if Clojure developers were less annoyed by bugs than C++ developers? I see that conclusion as just as valid as those you've drawn.

I agree that static typing is not an excuse for tests (even though many folks in the FP community wold say otherwise...). Types can certainly eliminate a class of problems if used well, but they're certainly not a silver bullet. Tests guard against change. For any decently-sized project, you need tests to protect your code against yourself :).

I find a troubling back-of-the-envelope correlation between language popularity/adoption level and number of bugs found. I think some of this can be explained by more eyes on the code (and users) and more inexperienced contributors.

Who is mostly writing Go code? Experienced enthusiasts and those with a staked investment in the language. Who is writing C++/Java/Python? Pros, but also new programmers who are flailing around trying to make a mark and learn real lessons after 50 cumulative hours programming.

This is however a well known bias in data analysis. There might be a hidden phenomenon that explain most of the correlation.

In case of geography for instance, you must always be careful of not redrawing a simple population map. Because high number of occurence often happen in highly populated place.

In your case repository with the most contribution are logically thoses that contain also the more bug reports. So by calculating your bug density by dividing against the number of repo what you might actually mesure is the number of active contributors.

You could check this by testing if charting the number of contributors produce a similar graph to your indicator.

And to overcome this you might divide number of bug by the number of active contributors to each project. Also you might need to filter a specific timespan because older project expectedly have more reported bug. So only number of issue and number of unique contributors from last year shall be taken in account.

By implementing these changes you could have a more robust indicator using the same source.

Another thing you might look at along these lines is to subtract bug reports submitted by contributors to the project, so as to try to distinguish (if imperfectly) between bugs discovered by users and bugs logged by those who are developing the project. For example, in a Haskell project it may be considered a bug if an invalid state is representable given the type signature even if that bug is never encountered as a runtime error, whereas in a Clojure project this isn't even a concept. However, this sort of "bug" is unlikely to be reported by someone who's simply a consumer of a library, so maybe excluding contributors (perhaps over some threshold?) can help to filter out issues that may not affect end-users.

Maybe you should add the meaning to the article? I also thought it would be bugs per line of code, a measurement that is useless by itself.

But up-so-far I think pretty much every measurement I have seen is useless.

Just to get it right. A project with 1 file and 100 lines of code in language X with 1 bug has technically a smaller "bug density" then a 50 million lines of code project in language Y with 3 bugs? If "yes", do you think that this is a useful measurement?

Sure, you don't find a 50 million line code project with just 3 bugs. It will have a lot more. That's the point, the bigger the code size the more bugs you usually have.

Usually a comparison of bugs per line of code is "better". But "better" still doesn't mean useful. Some languages are 2-3 times more succinct for the same functionality. So a more succinct language with the exact same amount of bugs will automatically have a larger "bug density" (considering bugs per line of code).

The assumption that every language somehow solves the same problems is also not really correct. A lot of languages like PHP, Python, Ruby, Perl and so on are primarily web-development. And a lot of stuff is only solved by using C libraries. Or in other word, not really solved at all.

Some binding to GUI frameworks like GTK, Qt or game-engines (what you see in Python and so on) sure never will have the code size or complexity like a whole library in C (its just a binding).

And you are holding that erlang's immutability or functional paradigms for ex are not linked to academia because they are supposed to address "real production systems" issues?
I think it is an artificial distinction. For example elm is heavily based on all you would categorize as "academic" but its intent is to address real issues in client side development.
How can one separate these 2?

Academia is extremely important and should be a source of inspiration to the industry.

In fact Curry On is one of my favourite conferences: "Academia and industry need to have a talk."

Experimentation is key to advance the state of the art, but do you want experimental programming features in your production code? Brian Goetz, one of the Java Language architects explains it better here

I always had the impression that in an ideal world programming would mean some kind of 1-to-1 relationship with discovered principles of math and nature rather than invented languages based on invented principles. Something closer to ideals that are inherently perfect from logic rather than inherently flawed human constructs (not that they are not pragmatic).
In any case, thanks for the resource!

I can't shake the idea that simplicity is always measured through a lens.
C is simple: from the perspective of the hardware, C is conceptualy close.
Haskell is simple: from an equational perspective, Haskell is conceptually close, values are mapped from domains to codomains (even IO and state is modeled this way).

Have you considered that you may have actually measured known bug-count vs unknown? To me this is like like comparing pennies in a jar vs missed pay-checks... It's the unknown long-term problems with systems (rounding errors, off-by-one's, partial API regressions, and design flaws) that lead to the biggest problems.

It's for sure interesting, I'd love for there to be an answer, but I've been making the transition from dynamic -> static yo-yo'ing without any evidence for or against either for the general-case since the 90's.

Not sure if I understood you about the known vs unknown. Do you mean that for dynamically typed languages, there are bugs that have not been reported or found, while those same bugs would have been reported in a staticly typed lang?

I love the pennies vs paychecks analogy. I will steal it for a future blog post ;).

I am with you in the static vs dynamic debate, that is why I wanted to propose a different one: simple vs complex. On this one, I would position myself on the "simple-by-default" camp, were doing complex things was painful and non-idiomatic. What about you?

Articles like this based on real data are great to read. I like your analysis and share your thoughts that bug density is not significantly affected by static vs dynamic typing or any one feature for that matter. There are many reasons for bugs, many reasons why a developer reports a bug, different debugging tools and troubleshooting abilities, and many programming language features all of which factor in. One feature will never cause approximately 5 to 6 times more bugs or that feature will be quickly replaced and depreciated.

Complexity is definitely a significant factor. Another big factor is the number of intermittent errors that cannot be reproduced. Multithreaded programming and languages that do not automatically manage memory like c are more prone to intermittent bugs, race conditions, etc and have errors that are very difficult to reproduce. This leads to bugs that never get fixed and eventually add up. Fixing bugs is easy if you remember one thing... You can't kill what you can't catch.

Funny how c / c++ which has the highest bug density in Round 3 is also used in our most important systems, Linux and Windows. Remember Windows 95, 98, Me? They had problems. Windows 2000, XP, 7, and 8? Much better. Both written in C. Why was one better than the other? I assert architecture.

I have updated the post to make it clear: "By bug density I mean the average number of issues labelled "bug" per repository in GitHub"

The assumption is "I do expect is that roughly all developers, no matter the language, have to solve the same problems, so the open source libraries available have roughly the same functionality.". David seems to disagree on this assumption. What are you thoughts?

I also remember reading somewhere that bugs are constant per lines of code, but maybe what was constant was the number of lines produced or the number of lines that you can keep in your head. I unable to find the reference right now.

Steven McDonnell in the "Code Complete" book says: "the number of errors increases dramatically as project size increases, with very large projects having up to four times as many errors per line of code as small projects"

Great idea for another pet project. Maybe one for PurelyFunctional.tv? ;)

I've used static and dynamic languages and I agree with the hypothesis that static languages, when used well, help you reduce the probability of bugs. In many cases, people write poor code. If you use a static language, like F# or Haskell, but use it like it is JavaScript or the old C, C++ it is normal that bugs will arise. Most programmers are "Primitive obsessed" which is a source for some bugs. Many like to cast all over the code too. Programming in a way which makes invalid states impossible to represent helps a lot and also saves you a lot of testing. Usually when I get a code to compile I have very few bugs and most of them are caused by a bad communication of the requirements.

I'd expect Haskell bugs to be more about misunderstood/incorrectly implemented business logic whereas more dynamic languages potentially have a lot more issues that stem from not being able to enforce invariants at compile time. Mind you that they probably still have the same potential logic bugs lurking behind, users just may not have gotten there yet cause some code path led to 23 being added to "foo".

This is based on many years working full time in Ruby, while at the same time time running FP user groups and contributing to various compiler projects, both statically and dynamically typed.

My feelings -- just my feelings, not backed by any hard data -- is that the most important thing is both simplicity and writing the source code for maintainability and legibility. What Uncle Bob wrote about in his book Clean Code.

Some languages lend themselves to simplicity. For example, I'm impressed with D, Python, Lua and F# ... all of which have a clean syntax and are rather free of excessive "ceremony". Which is why I have a soft spot in my heart for those languages.

But the languages I use that pay the bills are C++ and C#, and I have a love-hate relationship with both of those languages. (More vehemence for C++, because I've been using it for a very long time.)

Bugs can be written in any language. But languages like C++ that have so many areas of undefined behavior that are easy to accidentally stumble into do no one any favors.

Languages that have contract programming, like Eiffel, D, and Ada 2012, make unit testing a lot less important because the contracts can be specified directly in the code instead of being encoded in unit tests. (That's what unit tests do: they express contracts.)

In my experience, statically typed languages -- like Go, C++, D, F#, Swift, TypeScript -- don't have much better protection from the duck typed languages like Python, JavaScript, Boo for "not making bugs". What the static typing does provide is scaling. Small applications gain little benefit from static typing. But as applications grow helps to make sure the pieces are fitting together correctly.

Case in point is Google's Angular that was converted from JavaScript to TypeScript, they had discovered that there had been a good number of bugs in their code that were caught once they had the static typing of TypeScript. (TypeScript transpiles to JavaScript, and the type annotation information is erased. It's a transpile time safety net.)

But, I've also worked with large system based in Objective-C which has a mix of static type checking and runtime duck typing, due to the nature of it using message passing to objects. (The message passing is reminiscent of SmallTalk.)

When I think of duck typed languages, I usually think of scripting languages. When I want to do something quick-and-dirty I reach for Python. When I want to make something application-like, I reach for a static typed compiled language.

But there are languages out there that bridge the two worlds of sorts. Languages that minimize the ceremony around the static typing, like OCaml, F#, and Swift. They're still all strongly typed, but the burden is more on the shoulders of the compiler, rather than forcing the developer to dot all the i's, and cross all the t's.

So I'd say that static typing catches a small category of bugs. For smaller applications, those kinds of bugs are few. For larger applications, those kinds of bugs can be crippling.

I don't know of any scripting language that supports contract programming as part of the core language. (Educate me if you know of any!)

A vastly bigger source of bugs in programs I work in is mutable global state. By which I am also including local mutable member variables in a class instance... that's a smaller scope global state. Programs that I've seen and I've written that emphasize immutability and segregate immutable data from functions and side-effect free functions seem to produce a lot less bugs.

I'm not sure if the "less bugs" I'm seeing is because I'm a better programmer with those kinds of languages, or if I make less bugs in those languages because it is easier to reason about the correctness of the code. Doesn't have to do with all those languages being statically typed. I believe it does have to do with immutable data and lack of global state has more simplicity.

Another vast source of bugs I've run into is null pointers. (Damn you Tony Hoare for adding in the null reference to ALGOL W!). That's another area where Haskell, F#, OCaml, Swift outshine C, C++, C#. Objective-C sort of sidestepped the problem with its treatment of the nil object quietly eating messages (well, almost quietly... the eaten message is output to the console log).

How cool would it be to have a logistic regression model with variables like number of developers, average years experience, language, test coverage, bug density, etc. over all github projects, with stars strata? One can only dream.