PHP Eats Rails for Breakfast

Despite the buzz around sexy new frameworks like Rails and Django, PHP is more dominant than ever.

Here at Ohloh we’ve accumulated an enormous database of open source development facts. So far, we’ve indexed over 3,000 projects and 220 million lines of source code. In addition, we’ve followed the history of these lines of code, to identify when and by whom all of this code was written.

As a result, we can measure the total amount of activity in a given language over time. In this article, we’ll take a look at the changing popularity of web scripting languages, specifically PHP.

One thing that we can readily see is the increasing dominance of PHP over the last several years:

It’s clear that the web [and I’ve intentially omitted that inescapable “2.0” suffix] is being built with PHP. Measured purely by the number of new lines of code, PHP leaves all other web-scripting languages in the dust, and continues to grow. Quite simply, one-fifth of all open source code being written today is written in PHP.

This isn’t much of a surprise. The top five most popular projects on Ohloh — starting with MediaWiki, phpBB, and Coppermine — are all written in PHP. The world is shifting away from rich client applications, and towards the web. PHP looks like the language of choice in this new world.

Another way to compare a language’s popularity is to measure the number of developers actually working in that language:

An interesting observation when comparing this chart to the one above is that the relative number of developers working in PHP isn’t increasing — so how can one explain the great rise in the amount of code these developers are churning out? Perhaps a clue lies in the number of new projects being started:

Curiously, it seems that the number of new PHP projects has declined. Perhaps this implies that much of the new PHP code is being added to established projects. If so, one might explain the rise in new code as a consequence of maturing code bases: as developers gain long experience within the framework of an older application, their output increases. Alternatively, perhaps the small number of new projects in fact contains huge amounts of code which has been repurposed [a nice word for “cut-and-pasted”] from older applications. This is just speculation, and I invite discussion.

About the Data

The Ohloh database was initially seeded by a survey of the most popular projects on the major source code forges, and is now driven by user-edited content. Ohloh is not a complete picture of the entire open source world, but it’s the most inclusive database of which we’re aware. Ohloh has indexed thousands of projects, including a large fraction of the most popular open source software. You can help to improve our data by adding your own project to the Ohloh index.

Development metrics are obtained by interfacing directly with underlying source control systems. We parse revision logs and source code files in detail. Because of this methodology, Ohloh can only include projects which are hosted on publicly accessible CVS, Subversion, or Git source control systems. Projects using other source control systems, projects with private source control, and those danger-loving projects that use no source control at all [you know who you are!] cannot be included.

For purposes of the charts above, we have omitted the first commit in each repository. This results in an under-representation of code activity. However, this omission helps us avoid a common problem that arises when a project team moves from one source control system to another: such a move is usually initiated with a massive migration which copies all of the existing code into the new repository — code which we’ve already measured in its old repository. By ignoring this first commit, we avoid double-counting a huge amount of code.

Markup languages (XML, HTML) have been excluded from the above analysis.

You can view the underlying data which generated the above charts in a PDF.