Some of the questions we get most frequently at RedMonk concern programming language usage; which languages are being used, how much, and what are the respective growth/decline trajectories? Because there is no single canonical source for this data – even representative surveys are problematic – we examine as many distinct sources as we can to form a larger picture.

One of these comes from our client Black Duck, whose already significant Knowledge Base was substantially expanded by its October acquisition of Ohloh. Black Duck’s primary mission in life is digesting information about open source code, from license to language, to streamline the consumption process for enterprises. As it turns out, this data can also be used to understand developer trends. The folks from Black Duck have been kind enough to share some of the language usage data from their knowledge base, which we hope to do regularly, and which I in turn will relay to you here.

Before I proceed, two things to note.

First, the data supplied by Black Duck included the Top 13 languages usage, but I’ve filtered that down to the seven you see here. Among those filtered was what Black Duck defines as “shell,” which was one spot higher than Ruby; it was omitted because part of that volume is likely configuration and installation shell scripts, which are not what I’m interested in here. The other languages were omitted – as with C# – because their overall usage was insignificant (1.2% all time) and growth or decline were neglible.

Second, the dates selected were arbitrary for this instance, because this was an ad hoc query run at our request. Consider timing as necessary when evaluating this data.

First up is all time usage data. This represents the percent of programming language usage within the Black Duck Knowledge Base.

This data contains few surprises. C (44.6%), C++ (13.3%) and Java (9.4%) are the volume languages, with JavaScript, PHP, Python, and Ruby showing more modest but still significant traction.

Next, let’s examine the usage pattern for the twelve months prior to 8.12.2009 and the year ending last Monday, 3.28.2011.

In the 19 months between those dates, we’re seeing an interesting shift. Note, for example, that in the twelve months trailing March 28th, JavaScript passed Java. The pattern is more apparent if we depict just the delta between the years. This represents the percentage in change from the year ending August 2009 to the same timeframe ending March 2011.

This data supports the view that dynamic languages like JavaScript and Ruby are gaining share, possibly at the expense of traditional enterprise languages like C++ and Java. Note the odd growth in C, however; this may be an outlier as we’ll see below. The dynamic language gains are modest relative to total volume, of course. For context, the most popular dynamic language here, JavaScript, still represents less than a fourth of the total lines of C as of last week.

When we compare March’s figures to the all time volume, meanwhile, the pattern is even more pronounced: dynamic languages have universally gained share, while C, C++ and Java all have declined.

Conclusions

The data here seems to validate two recent conclusions; first, that JavaScript, Python, and Ruby frameworks are experiencing growth [coverage]. Usage of a developer framework, of course, is directly correlated with use of the language itself. Second, that Java has peaked from a relative adoption standpoint but remains a volume platform, with more lines of code than Python and Ruby combined [coverage].

The data also suggests that JavaScript in particular is seeing substantial growth, with the best growth rate against the all time data set and the second best versus 2009. Growth sufficient for it to overtake Java in total volume. Ruby performs only slightly less well, with the best overall growth rate from 2009 to 2011. GitHub’s data is similar; almost four months ago, JavaScript passed Ruby as the most popular language on the site.

With the caveat, then, that the above data is simply what’s measurable by Black Duck and cannot therefore be considered representative in a strict statistical sense of developers worldwide, it may be time for you to look at JavaScript. And maybe node.js, while you’re at it [coverage].

I'm curious how much, if any effort, is made to account for the fact that the Javascript in most Python, Ruby or PHP applications is simply copies of distributed libraries. In my case personally, Github indicates that my projects are 40% Javascript. The reality though is that there's almost no original Javascript in my aplications. It's just verbatim copies of existing libraries.

In terms of development effort I have multiple man years in the Ruby code that Github indicates is 60% of my code base and at most a several weeks in the Javascript that it indicates is 40%.

Other languages don't benefit from that same kind of inflation because they typically link to external libraries where with the Javascript it's included as source code.

I'm not suggesting that Javascripts not growing. I'm absolutely sure it is. I am however very suspicious of the numbers above.

http://redmonk.com/sogrady sogrady

@Mike Greenly: that is difficult to account for, and I know the GitHub guys have tried to workaround with regular expression exclusions.

I can’t speak for Black Duck here, but I suspect they’ll have some intelligence in terms of evaluating projects versus libraries.

I’ll ask them, though.

http://dberkholz.com/ Donnie Berkholz

Interesting data. The growth in Ruby is astonishing. I bet shell-based buildsystem files could be excluded easily enough, there's only a handful of common filenames.

I don't really understand what the dates mean in this context. Is this all code in existence on a certain date, or new code committed on a certain date, or projects that released tarballs on a certain data, or what?

Do you have the graph with the absolute numbers side-by-side for all time vs present?

I'd love to watch a movie of cumulative growth per year to see whether the changes are consistent or whether they bounce around year to year.

Since you mentioned statistical validity — from a statistical point of view, taking multiple random samples of their total database and looking at the distributions across those samples would provide more robust results.

I'm very curious about how the data was measured. Lines of code? Adjusted for language verbosity perhaps? Number of components? Did Black Duck provide any commentary on how the measurements were determined?

A big old project will grow slowly in LOC but may have many more comiters. This is simply because it is more time consumming to track a bug or add a feature for a software with 10 millions line of code than one with 10 thousand line of code.

I see a real life example from my own experience. At my previous job, working for a new small project (100K LOC), a developper would produce on average something like 50 LOC of JAVA code per working day.

Now I’am working in a big software with more than one dozen million LOC. The average is of 7 LOC/dev/day. More interresting, new developpements it more like 2 LOC/dev/day.

Another problem, git is about open source and has been made as the source control software for the linux kernel (written in C). It is then logical that you see lot of C inside git repos. Pythons users for example tends to use more mercurial than git. This would lower python stats.

But there is even more. Github is about open source. We don’t speak about ALL softwares. If only a few of us make open source software, most developper have a work for a living. And most of the time it is for closed source software.

I really doubtfull that the average OSS guy will like/use the same language than the average software developper. Only a few OSS software are really used. Most are just experiment from one guy were there is no constraint of the language. With github there is even many that create a project, just to illustrate a blog post or something.

What all of this mean? Added with other comments, I really doubt the gathered statistics has any interresting meaning other than pythons guys are more on mercurial and JAVA source code is often on closed source so not really visible here.