"Never trust a computer you can't throw out of a window" - Steve Wozniak.

Sunday, November 6, 2011

StackOverflow's Programming Language Bias

Using StackOverflow.com, I've always had the impression that its biased towards C#, and .NET in general. That might have been because on the SO tags page C# is number one, with the most questions tagged, or maybe because much of the site itself is built with C# and MVC.NET. So I thought it would be interesting to compare the rankings of the tag popularity on StackOverflow with a leading language popularity index (the TIOBE index).

Method
The "Stack Overflow Representation" represents a ratio of the SO tag count (as a percent of total questions) divided by the TIOBE language index percent. Where a representation of 100% means that the SO tag count is aligned exactly with the TIOBE language index. An "over-representation", greater than 100%, might mean there's a greater number of questions on SO than we'd expect. An "under-representation", lower than 100%, might mean there's not as many questions on SO than we'd expect.

Results
Suprisingly, JavaScript came out to be the most "over-represented" language on SO, by quite a long way at 294%. Could this also be because programming JavaScript is generally quite difficult and will result in people seeking help more often? Following this was C# (which I had expected to be number 1), at 153%. After this, PHP, Ruby and Python were basically fairly balanced at around 100%. The most "under-represented" major language would definitely be C at 11%. Three other major languages which seemed to be a bit under-represented, below 50%, were C++, Java and Objective-C.

38 comments:

I'd probably suggest that the over-representation of languages like javascript and c# on stackoverflow is due to them being (i) crap languages, so people need more help understanding them, or (ii) most often used by people who are novice programmers. C++ programmers, imo, usually know what they're doing, for example, and so don't need SO's help. The only surprise for me is why PHP isn't featured more highly on SO, since (i) it's the crappiest of the crap, and (ii) used a lot by people who don't know what they're doing.

PHP ?= Crap! No way, dude. It's awesome.When I develop with C++, I usually don't know what I'm doing. I don't go to stackoverflow.com, because I find it to be a bit stuck-up. I'll go to cplusplus.com, which is friendlier. I think that applies to a lot of novice C++ programmers, hence the under representation.

While PHP is extremely popular, and used by extremely large companies like Facebook and Zynga, it's a relatively inferior language. Its type coersion is terrible, the concepts it melds are relatively non-contiguous, the associativity of its ternary operator is backwards, and its runtime performance is abysmal to the point Facebook wrote a PHP to C++ compiler in order to improve he performance.

PHP is the epitome of the "worse is better" aesthetic. Please don't try to twist that into an argument that it's in some way good simply because of its popularity. To do so falls into the logically fallacious realm of argument ad populum.

It would be interesting to factor in "Language Mastery" to this analysis, but I have no idea where you'd gather the statistics from. At a guess I would sat that there are not that many weak programmers noodling around with:

Ada - they have been rejected as 'unfit' by the Department of DefenseAssembly - they have always been too frightened to learn about their CPUC - they have long since migrated to Java or are trying to write iPhone AppsLisp - they were alienated by the ( )s, leaving it to be used by AI researchersLua - they can't really make mistakes with this as it is virtually idiot-proof

So, without any real evidence... Lua must be the least difficult language.

I would guess that JavaScript is overrepresented because it is (basically) a monopoly on web UIs. Web-based front ends are extremely popular in all fields these days, and while the back end might be written in any number of the other languages, chances are relatively high that you're also going to have to do some front-end development, and web UIs are only getting more popular. Consider all of the traditionally desktop applications that are moving their UIs to web-based (SabNZBD+ comes to mind immediately.)

Therefore, the Ruby on Rails programmers, the Python/Django stackers, and the poor, poor Java/JSP devs are probably also going to be bringing JavaScript questions to the board.

And with the advent of node.js and competitors, JavaScript is also starting to encroach on the server-side realm...

Actually, I propose that the issue isn't languages at all, it's libraries. Javascript and C# have the largest and most complex libraries that ordinary programmers encounter in a given day (namely, the DOM and .NET). These have accreted without competition from day one, and in the case of Javascript, have multiple independenly derived implementations, so they tend to be difficult.

Calling Javascript "a crap language" doesn't help. It's a fairly ordinary implementation of Scheme-with-labels, and its core is easily grasped by any programmer within a day. It's the lack of tooling and the complexity of the environment that makes it challenging.

Javascript and C# also have the headache that they're User Interface languages. People who program in them are programming for other people, not for operating systems and servers. People are more complicated and challenging than computers.

Leave it up to the .net evangelists to turn a perfectly good blog post into "here's some vague reason php sucks." You know, of the list, people are having more trouble with C# than any other language other than Javascript (which we expect). Maybe if you idiots spent more time writing code, and less time complaining about languages you don't even understand the basic concepts in, the numbers would look a little different.

Given that the TIOBE index is garbage, I think judging the ratio of real projects on a public open source repo by it is completely backwards. You might better ask the question: based on the GitHub distribution, where is the TIOBE index totally out of whack?

Or you could concede that they are likely both not representative of anything general and thus comparing them tells you nothing.

I agree with Alex. The TIOBE index's methodology is garbage. There's no evidence it has any relationship to the real world number of programmers.

What your numbers show is not SO's programming bias, but simply that SO's ratios are different from TIOBE's. There's no anchor to the real world here. One cannot say who is over nor under representing.

It's still interesting data, but there's nothing to support the conclusion. Throw in some more cross-language sites like Github and the scattering of language use would be interesting to look at, but without an anchor to reality there's no way to judge who is biased.

@willr, interesting theory ... goes some way to explain why well established languages like RPG, COBOL, FORTRAN, C, etc. don't feature much on StackOverflow, given the significant number of developers using them. Could also be that these devs tend to be older and maybe just don't head straight online for a solution to their problem as many younger devs do?

@Lars, hehe - unfortunately you could say the same about things like COBOL and VB - they're not pretty, but they get the job done and they have carved out a niche for themselves in certain classes of applications.

PHP was several flaws, but, was there, where other languages weren't. And I wish there where other languages, because there are so many websites done in PHP, that are very difficult to maintain, today (even if got paid) !!!

There is also the point of how is a programming language used, web programming, desktop proramming, cross-platform, cross-os or single platform or single o.s. ...

I have about a dozen Java technology related forums in my bookmark list, ranging from JavaRanch to the JBoss and Spring sites. The fact that SO is lacking in Java is no shock. Java had a number of solid forums before SO.

And I wouldn't be surprised if there were many C programmers on usenet or on mailing lists. When I was writing C about 15 years ago, Usenet and email lists were the best ways to get help. I wonder if that is the same now.

Of course, the majority of questions on SO are about libraries (often 3rd-party libraries) and tools, regardless of language tags. A lot of questions tagged C# should really be tagged .NET since they are questions about libraries or tools that any .NET language can use.

The measure of "over-representation" is certainly a combination of many things but I suspect the two primary things it is measuring are language growth of fresh developers (rather than growth of usage of the language which TIOBE measures) and also the lack of other sources of really good online documentation.

For example:

Javascript is not any harder than many of the other languages on the list, but for Javascript, StackOverflow is one of the best sources of documentation online. There is currently a flood of new developers joining the Javascript community.

So what if they ask more questions ? Does that make them any less programmers ? I bet you never posted a single question about your language.

Isn't what the web is for ? Making information more accessible , you tend to forget that sometimes even a great developer often need others opinions or advice .

And to correct you , .Net or more specifically c# is one of the best generak purpose languqge and if you can use msdn efficiently you'll find your name less online.

I believe that regardless of anything. A great developer will write a good program using any language. Perhaps if you spent more time developing software than trying to engage in language politics you'd be great too.

I'm surprised that a lot of people say JavaScript isn't hard ... that may be so for just the core language by itself, but I've found that when you try and do anything significant with it involving the web/DOM/browser-compatibility or other libraries such as jQuery you can quickly run into problems.

The other thing to consider is that the majority of questions on SO tagged with JavaScript would probably be relating to an associated library or browser issue, so that may have inflated the total number of JS questions asked.

Javascript is the most used language in the planet, obviously more people are going to have questions about it and some parts of it sometimes don't make much sense unless some one explains them to you.

@climboid, yeah, JavaScript has also become known as "the duct tape of the internet" (or was that perl?). Most if not all web application development includes some bits written in it. That and the fact that the people using it are often not experts in JavaScript.