What the heck is going on with measures of programming lang…

I looked at the TIOBE index today, as I do every so often, as most of the software pros I know do every so often. It purports to measure the popularity of the world’s programming languages, and its popularity-over-time chart tells a simple story: Java and C are, and have been since time immemorial, by some distance the co-kings of language.

But wait. Not so fast. The rival “PYPL Index” (PopularitY of Programming Languages) says that Python and Java are co-kings, and C (which is lumped in with C++, surprisingly) is way down the list. What’s going on here?

What’s going on is that the two indexes have very different methodologies … although what their methodologies have in common is both are very questionable, if the objective is to measure the popularity of programming languages. TIOBE measures the sheer quantity of search engine hits. PYPL measures how often language tutorials are Googled.

Both are bad measures. We can expect the availability of online resources to be an extremely lagging indicator; a once-dominant dead language would probably still have millions of relict web pages devoted to it, zombie sites and blog posts unread for years. And the frequency of tutorial searches will be very heavily biased towards languages taught en masse to students. That’s not a meaningful measure of which languages are actually in use by practitioners.

There are lots of weird anomalies when you look harder at the numbers. According to TIOBE, last C went from its all-time lowest rating to Programming Language Of The Year in five months. I can buy that C has had a resurgence in embedded systems. But I can also easily envision this being an artifact of a highly imperfect measure.

The more flagrant anomaly, though, in both of those measures, is the relative performance of Objective-C and Swift, the two languages used to write native iOS apps. I can certainly believe that, combined, they have recently seen a decline in the face of the popularity of cross-platform alternatives such as Xamarin and React Native. But I have a lot of trouble believing that, after four years of Apple pushing Swift — to my mind, an objectively far superior language — Objective-C is still more popular / widely used. In my day job I deal with a lot of iOS/tvOS/watchOS apps, and interview a lot of iOS developers. It’s extremely rare to find someone who hasn’t already moved from Objective-C to Swift.

But hey, anecdotes are not data, right? If the only available measures conflict with my own personal experience, I should probably conclude that the latter is tainted by selection bias. And I’d be perfectly willing to do that …

… except there is another measure of programming language popularity out there. I’m referring to GitHub’s annual reports of the fifteen most popular programming languages on its platform. Those numbers are basically a perfect match for my own experience … and they are way disjoint from the claims of both both TIOBE and PYPL.

According to GitHub’s 2016 and 2017 reports, the world’s most popular programming language, by a considerable distance, is Javascript. Python is second. Java is third, and Ruby a close fourth. This is in stark contrast to TIOBE, which has Java and C, then a big gap, then Python and C++ (Javascript is eighth) — and also to PYPL, which claims the order is: Python and Java, a huge gap, then Javascript and PHP.

Obviously the GitHub numbers are not representative of the entire field either; their sample size is very large, but only considers open-source projects. But I note that GitHub is the only measure which counts Swift as more popular than Objective-C. That makes it a lot more convincing, to me … but its open-source selection bias means it’s still far from definitive.

These statistics do actually matter, beyond being an entertaining curiosity and/or snapshot of the industry. Languages aren’t all-important, but they’re not irrelevant either. People determine what languages to study, and sometimes even what jobs to seek and accept, based on their popularity and their (related) projected future value. So it’s a little upsetting that these three measures are so starkly, radically different. Sadly, though, we seem to still be stuck with tea leaves rather than hard numbers.