The RedMonk Programming Language Rankings: January 2014

As long as we have been doing our programming language rankings here at RedMonk, dating back to the original publication by Drew Conway and John Myles White, we have been trying to find the correct timing. Should it be monthly? Quarterly? Annually? While the appetite for up to date numbers is strong, the truth is that historically changes from snapshot to snapshot have been minimal. This is in part the justification for the shift from quarterly to bi-annual rankings. Although we snapshot the data approximately monthly, there is little perceived benefit to cranking out essentially the same numbers month after month. There are more volatile ranking systems that reflect more ephemeral, day-to-day metrics, but how much more or less popular can a programming language realistically become in a month, or even two? The aspect of these rankings that most interests us is the trajectories they may record: which languages are trending up? Which are in decline? Given that and the adoption curve for languages in general, the most reliable approach would seem to be one that measures performance over multi-month periods at a minimum.

This month’s ranking, however, may call that approach into question. From Q113 to Q313, for example, only two languages in our Top 10 experienced any change – Java and JavaScript briefly swapped places. Between Q313 and this Q114 snapshot, however, six spots have new owners. Now it’s important to emphasize, as the caveats below note, that the practical significance of moving from one rank to another is very slight: no one is going to use one language or drop another because it’s fifth rather than sixth, for example. And it is necessary to note that the way these rankings are conducted has changed for the first time since their inception, due to a change on GitHub’s part.

Previously, GitHub’s Explore page ranked their top programming languages – theoretically by repository – and we simply leveraged those rankings in our plot. For reasons that are not clear, this provided ranking has been retired by GitHub and is thus no longer available for our rankings. Instead, this plot attempts to duplicate those rankings by querying the GitHub Archive on Google’s BigQuery. We select and count repository languages, excluding forks, for the Top 100 languages on GitHub. Without knowing precisely how GitHub produced their own rankings, however, we can’t be sure we’re duplicating their methods exactly. And there is some evidence to suggest that the new method is an imperfect replica. Previous iterations have produced correlations between GitHub’s rankings and Stack Overflow’s as high as .82 but never one lower than .78. This quarter’s iteration is the lowest yet at .75. It’s possible, of course, that this is reflective of nothing more than a natural divergence between the two communities. But it’s equally possible that our new method is slightly different, and therefore producing slightly distinct results, than in previous iterations. Until and unless GitHub decides to resume publishing of their own rankings, however, this is the best method available to us. This must be kept in mind when comparing these results against previous iterations.

Besides that notable caveat, there are a few others to reiterate here before we get to the plot and rankings.

To be included in this analysis, a language must be observable within both GitHub and Stack Overflow.

No claims are made here that these rankings are representative of general usage more broadly. They are nothing more or less than an examination of the correlation between two populations we believe to be predictive of future use, hence their value.

There are many potential communities that could be surveyed for this analysis. GitHub and Stack Overflow are used here first because of their size and second because of their public exposure of the data necessary for the analysis. We encourage, however, interested parties to perform their own analyses using other sources.

All numerical rankings should be taken with a grain of salt. We rank by numbers here strictly for the sake of interest. In general, the numerical ranking is substantially less relevant than the language’s tier or grouping. In many cases, one spot on the list is not distinuishable from the next. The separation between language tiers on the plot, however, is generally representative of substantial differences in relative popularity.

In addition, the further down the rankings one goes, the less data available to rank languages by. Beyond the top 20 to 30 languages, depending on the snapshot, the amount of data to assess is minute, and the actual placement of languages becomes less reliable the further down the list one proceeds.

With that, here is the first quarter plot for 2014.

(embiggen the chart by clicking on it)

Because the plot doesn’t lend itself well to understanding precisely how languages are performing relative to one another, we also produce the following list of the Top 20 languages by combined ranking. The change in rank from our last snapshot is in parentheses.

JavaScript (+1)

Java (-1)

PHP

C# (+2)

Python (-1)

C++ (+1)

Ruby (-2)

C

Objective-C

CSS (new)

Perl

Shell (-2)

Scala (-1)

Haskell

R (1)

Matlab (+3)

Clojure (+5)

CoffeeScript (-1)

Visual Basic (+1)

Groovy (-2)

A few observations of larger trends:

Java and JavaScript: It’s fundamentally less important that JavaScript resumed its reign atop our charts after a brief one snapshot dethroning by Java than the fact that these are collectively and consistently the two highest ranking languages surveyed. In spite of their vast differences in design and usage, they are the focal point for enormous communities of development.

The Solidity of PHP: PHP is, as far as these rankings go, a bit boring. It finishes third behind Java and JavaScript like clockwork. While the language has its share of notable critics, investments from Facebook (notably HHVM), Zend and the like along with ubiquitously popular projects such as WordPress are apparently more than sufficient to sustain a robust market position.

Gains for C++/C# / Losses for Python/Ruby: It’s tough to say which was more odd from the result set: the slight gains from the compiled languages or the slight declines from the interpreted alternatives. To be clear, it’s dangerous to read much into the wider popularity of any of these runtimes based on these results. Ohloh, for one, does not concur with the trajectories implied.

But they do represent a change at least within this result set – which has been relatively static. There are some who are – anecdotally, at least – arguing that a C++ renaissance is underway. Until we see more hard data, it’s probably safest to chalk the small change in fortunes here up to statistical noise, but we’ll be watching compiled language trends closely and looking to test the hypothesis wherever possible.

Clojure Makes the Top 20: For the first time since we began surveying, Clojure joins its JVM-based counterpart Scala as a Top 20 language. It is the continuing success not only of Java the language but JVM-based alternatives that makes the regular “Java is dead” arguments so baffling.

Statistical Language Popularity: Both R and Matlab experienced gains this quarter, and this was the third consecutive quarter of growth for R in particular. While, as the plot indicates, these languages tend to outperform on Stack Overflow relative to GitHub, they are indicative of a continued rise in popularity for statistical analysis languages more broadly.

The Rise of Go: Go, which we termed a notable performer in last year’s Q1 ranking, continued its rise. It checked in just outside the Top 20 at 22 this quarter, a gain of six spots from last quarter.

Languages to Watch: In the intial run of the data for this quarter, Julia, Rust and Elixir finished back to back to back. After making a correction to the GitHub Archive query and re-running the data, they finished Julia, Rust and then Elixir one spot removed from Rust. Regardless, while these are not going to challenge for Top 20 rankings within the near future (Julia performs best at 62), they are each languages to watch, with notable followers and contributors. We’ll keep an eye on each as we move along.

Big picture, the takeaway from the rankings is that language diversity is the new norm. The Top 20 continues to evidence strong diversity in domain, and even non-general purpose languages like Matlab and R are borderline mainstream from a visibility perspective. Expect this to continue, with specialized tools being heavily leveraged alongside of general purpose alternatives, rather than being eliminated by same.

Maximilian Strohsays:

Agree. The more question on SO for a specific language, the worse the language, since more people have problems using it. Thus said… the more questions on SO, the higher the actual language usage roughly (not the popularity)

Eamon Nerbonnesays:

I think you’re misreading that. It looks to me more like stackoverflow has a somewhat broader base; specifically including more internal apps (unlikely to be open source on github), one-off scripts (unlikely to be public at all), and mature software (history predating github). Github – by its nature – is more likely to include software that’s part of an actual project.

That explains the one-off problem-solving scripting languages like awk, & applescript being so much better represented in stackoverflow than github. However, also things like R and matlab – typical one-off analysis tools – are less well represented in github.

The second clear factor is that windows usage is clearly more a stackoverflow thing; so anything windows-related like C#, visual basic, powershell are clearly less represented on github. Unix oriented things like emacs lisp and shell, by contrast, are more popular on github.

It’s a little unfortunate that this is a ranking, not some kind of logarithmic scale, so we can’t really get a sense of the magnitude of the differences.

While noting up front that there is no perfect way to analyze languages, as every conceivable process will be flawed at least in some way, one of the things we appreciated about Drew and John’s original approach was that it attempted to balance languages that might be over or under-represented in either GitHub or Stack Overflow.

So while it may be true – I’d argue the point – that the latter is over-representing “problem” languages while underselling “easy” ones, one might reasonably assume the overwhelming advantage that would represent would manifest itself in a corresponding outperformance on GitHub.

it’s not a perfect balance, of course, and there are always problems, but part of the point of the plot is that it allows you to determine which languages may be skewed towards one property or another and adjust accordingly.

All of that said, suggestions for improving the plot are welcome. A log scale plot, FWIW, actually hurts clarity rather than helping it. The magnitude of the differences, as mentioned in the caveats above, is slight, and the log scale therefore is very compressed and difficult to interpret.

IRLeifsays:

You make many good points here. In general, I think that there are too many factors that were not taken into account. It all depends on what we’re trying to gauge—In this case, it seems to be the popularity of these programming languages on real projects and the number of problems people encounter when using them to solve problems.

Here we are only considering the questions asked on Stack Overflow and projects on GitHub—There are a great many questions and projects, behind closed doors, which are not represented in this dataset.

This is an interesting comparison of programming languages that are represented on GitHub and Stack Overflow; not programming languages in general.

Eamon Nerbonnesays:

By “one-off” I’m not trying to denigrate R, nor trying to suggest that people use R itself only once. R is famed for it’s vast (reusable) library of statistical analyses that are readily available. Rather, R is well-suited to specific data-analysis tasks in the sense that if you have a dataset, want to quickly deal with that specific dataset in some fashion, then R is an apt choice. And since the code is often short+simple (which is a good thing) and tuned to the task at hand, it’s not as likely a candidate for a github project as a shared library, say. Furthermore, if the data you’re analyzing is private, it wouldn’t surprise me if you’d be cautious in publishing it structure indirectly as a github project might.

So, for these reasons, I assume that there’s lots of R usage that’s not caught in the public-open-source-on-github net. Therefore when there’s more usage on stackoverflow than on github, I expect it doesn’t mean (as Maximilian Stroh suggests) that the language is worse, but rather that it’s merely underrepresented on github. That’s all I meant!

IRLeifsays:

“Simple” to whom? I think this data would be even more interesting and valuable if we could somehow take into account the persons using these programming languages and how they work. The demographics and subcultures are missing.

Eamon Nerbonnesays:

I’m pretty sure there are fewer questions about syntactic or language trickery, than about about libraries, so I don’t think this argument really holds. And there’s even more problem specific questions, and since the different programming languages are suitable for different kinds of problems, I don’t think you can really generalize and say simple languages will receive noticably fewer questions in any significant way – other factors are much more important.

IRLeifsays:

I agree that a greater number of StackOverflow might indicate a greater number of problems encountered, however, I think we must also take into account the breed of programmers who tend to gravitate towards certain languages, and other factors.

The subset of programmers who favour Common Lisp or Haskell, for example, might be more academic and naturally inclined to figure things out by themselves, whereas those who gravitate towards languages such as JavaScript or PHP might be more pragmatic in their approach and hence more active in communities such as Stack Overflow and GitHub.

It’s important to consider demographics and greater context to make sense of this data.

Fernando Raccasays:

Is it taken into consideration repositories hosting Github Pages? That would certainly influence results. Also, despite its usefulness, why is CSS considered a language? We are not just counting total lines of code produced. Otherwise Markdown could make it into the top 20.

Thanks for the stats. I’d image that Scala is at least 10th, realistically.

Blake Allensays:

Blake Allensays:

While its true CSS is technically turing complete, most would not consider it a programming language. I would however lump it into the javascript category, since the two are often paired for dynamic web applications.

Eamon Nerbonnesays:

CSS is not turing complete in the sense that I’d be comfortable using the term: each “computation” step in CSS requires user interaction (albeit potentially a rote action such as click anywhere). CSS without user interaction is not turing complete, and CSS with a normal (i.e. small and bounded) number of user interactions is also not turing complete. Furthermore, CSS has no memory, no infinite I/O tape (a critical part of a turing machine), it requires a DOM tree (i.e. html doc) for that, and the amount of addressable “memory” is strictly limited by the size of that document. Again, this is rather unusual for turing completeness.

Johnsays:

Ha! Finally, independent confirmation of what I’ve suspected all along: Stack Overflow just sucks for the languages that I happen to use (like Lisp and TeX).

What would be neat to see is how language popularity is across different communities, so I could see the best place to go for each language. There’s really no point in going to Stack Overflow for Common Lisp questions (it’s *dead*), but knowing that the language is popular on (for example) Freenode or Bitbucket or Usenet would be incredibly valuable.

Bernhardsays:

Obviously the tag structure is as it is. It still seems somewhat unfair, to divide Lisp into it’s different dialects (Racket, Common Lisp, Scheme, Emacs Lisp and to some extend Clojure) whilst summarizing all those Smalltalk dialects (Squeak, Pharo, GNU, Amber, Redline, VisualAge, VisualWorks, …) as one language. Gives Smalltalk some unfair haedstart. I am not sure, how this relates to other languages as I don’t know enough about them.
Is Python equivalent to CPython? There are derivatives like Jython, IronPython, Cython, Stackless python. Are they all subsumed under “Python” or are they to small to find their way into this chart?

Matlab is not primarily a “statistical analysis language”, though one could use it for statistical analysis. R *is* primarily a “statistical analysis language”, though it could also be used for many other purposes.

[…] two premier question and answer sites for professionals, Stack Overflow and GitHub, have indicated R language has gained further in popularity during this quarter as a statistical analysis language. With the rising graph continuing consecutively over last 3 […]

[…] dynamic web content as well as being relatively lightweight and easy to use. RedMonk named it the top programming language earlier this year. Last year it was number two and the year before it had been on the top spot, so […]

[…] popularity and scope at the start of 2014. The analysis on where Java is as a language is repeated on RedMonk’s Blog. The fact it remains a top two language didn’t surprise anyone, but it was the other angle […]

[…] dynamic web content as well as being relatively lightweight and easy to use. RedMonk named it the top programming language earlier this year. Last year it was number two and the year before it had been on the top spot, so […]