Is TIOBE Fatally Flawed?

update: As Bogdy mentions in the comments, my reasoning here was based on false assumptions. It still seems clear that ranking APL above Haskell, along with other anomalies, disqualifies TIOBE for any serious purpose, at least past the top ten or so languages. My rankings should be ignored, though.

During a debate at work about using Haskell for a project, a coworker pointed out that Haskell is ranked #41 on the TIOBE. On further investigation, things look really fishy. Common interpretations of TIOBE include the amount of “community”, “buzz”, or “excitement” around a language. By none of these standards can APL reasonably edge out Haskell. I dug further.

Summary of findings: the TIOBE is severely broken. It is falling victim to the fact that search engines grossly overestimate their number of results. For example, if I search Google for “haskell programming”, as TIOBE does, the resulting page proudly estimates 44,500 results. However, if I click through the results, I hit the end of the list after only 652. Nice for marketing Google, perhaps, but it seems the estimate was rather poor. Similar things happen with other languages.

TIOBE, despite using several search engines, seems to correlate well with Googles estimated (i.e., phony) number of results. It correlates very badly with the actual number of results. Here’s my corrected TIOBE list, built only from the top 50 languages in the original list. In order to comply with Google’s terms of service, I painstakingly did this by hand; so I didn’t go any further.

There are some things that are initially surprising; but some thought indicates they may be reasonably expected. Languages near the top tend to be those that are somewhat old (more time to write about them) or commonly used – past or present – in business and/or the academic world. That’s because these languages have a reason to have a lot of web pages written about them. One example: Prolog clearly isn’t a commonly used language nor one with a lot of community, but it’s taught by just about every computer science department in the world’s “programming languages” intro courses, because they feel better including something besides imperative and functional languages. Hence, it’s been written about a lot. One can see the effect of the “big community” effect though, if only in languages that appear above where you’d expect to see them.

I also split Lisp/Scheme into Lisp and Scheme separately, and dropped Natural because Googling for “natural programming” turned up more irrelevant results than relevant ones.

Indeed, Bill, I didn’t mean to say that it was. I only meant to say that in the real world, Google indexes more pages that contain the phrase “ada programming” than “python programming”. When one sees the real list, it busts a lot of misconceptions; including that the number of web pages about something is any kind of indicator of its popularity, community, or liveliness.

It may also be the phrase that TIOBE chooses. I’d venture a guess that there are a lot of web pages that talk a lot about Python but never contain the exact phrase “Python programming”. This is partly because that phrase is rather formal, and Python isn’t a very formal language. So yeah, the “new and improved” version isn’t foolproof either.

In my opinion, this kind of ranking just can’t be done by one formula. You’re generally right, cdsmith. Your ranking is correct for your aims. The TIOBE one is wrong for their aims, which would require a lot more-refined querying.

Give me a break! TIOBE is much more than just a googlefight. Read the TIOBE pages to learn how they generate this. You’re just going to make up your own ranking based on what google’s supposed “real” hit count is after going next, next, next to the end and then make your “own” ranking based on this? You’ve got to be kidding me. What a joke.

ASP.NET is not on the list because it’s not a programming language. You can use C#, VB.Net, Boo, or any .NET language to write an ASP.NET web application.
And, that is another problem in TIOBE index, at least regarding .NET languages. The fact is that the main language names (C#, VB.NET) are seldom used in blog posts. They are more used in basic articles or overviews of the language. .NET blogs are more concerned with some technology within .Net, so there are blogs about ASP.NET, Ajax, WCF, WPF, CLI, but rarely people write about C# or VB.NET itself.

and you’ll see that graphs that show language USAGE show much higher rate for C# than the search for c# programming. Also, if they looked at the Codeplex repositories, they’d probably see even more of C# in the big picture.