"According to statistics on sourceforge, C++ is the most common language followed by C"

while mslicker says: "I was supprised by the C++ figure because in a recent Linux distribution, C++ accounts for about 15% of SLOC and C about 71%"

With respect to free software, C vs C++ is the most relevant language rivalry: the most frustrating hole in David's survey: C++ is not a separate language entry; the second is that David is concerned primarily with overall popularity, and does not try to isolate popularity within the free software community. So the first question is: is there a convincing demonstration anywhere that one or other language is more popular?

The second question is a more personal matter of curiosity: I do not understand how, outside of a few niches such as OS programming where there are long established C-centric traditions, that anyone programming a large project would prefer C++, with its rich set of high-level program structuring facilities over C, with its flatly inadequate support. Why does C seem to do so well?

C as opposed to C++ is a simple language and is easy to understand and debug. C++ frameworks tend to have steep learning curves that require considerable investment in time before one can do useful work with them. C++ code also tends to somewhat more difficult to debug (my opinion). I would assume that free software authors want to maximize the amount of possible contributors that might be able to supply patches. C++ is also notoriously difficult to keep portable due to compiler differences.

Not that there are no good C++ projects out there, KDE comes to mind quickly. But this also has the size of a 500 pound gorilla and the framework is not easy to learn either.

C++ frameworks tend to have steep learning curves that require considerable investment in time before one can do useful work with them

I had thought of this, but there is no obligation with C++ to use the whole of the language. Just to take a simple example, C has completely broken string handling and awkward facilities for IO, problems that are more or less completely solved in C++ in a very accessible manner. To truly master C++ is a considerable achievement, but one can choose a fairly small subset of the C++ facilities and reap tremendous rewards in terms of language expressiveness and code readability.

Of course another aspect of KISS might apply: code maintainers don't want the hassle of policing that only the subset of C++ they choose to use is in fact followed.

Lastly, one of the points, if it was not clear, of the article was to open up the question of which is the more widely used language in free software, C or C++. It doesn't seem to be easy to see which is the more used.

"I had thought of this, but there is no obligation with C++ to use the whole of the language."

This is very close to completely false. C++ does so much behind-the-scenes work in what is not always the most intuitive way, that if you don't know the whole of the language you are likely to get bitten in a number of ways. Casting is the worst on this, as, while in C, casting is not likely to cause many problems, casting in C++ can create whole new objects and cause all sorts of memory management problems. Assignment doesn't always do what you think it will do to copy constructors. If you do not understand the whole of the language, then pretty much any action that you do will be greatly under-informed.

Exceptions can really kill most memory management techniques.

Overloading functions is nasty, especially when you have two functions with similar signatures. For example, if I have a function that can either take an int or a double, but for different reasons, I might get very wrong results if I accidentally pass the result of a computation that I forgot to cast back into an int. Optional arguments and named arguments are good, but overloaded function arguments are bad.

In addition, C++ is tough to link to. Even with the ABI standard, who all is following the standard? It's a pretty complicated standard, so it's only really useful within C++. On top of that, it is unclear what the semantics of libraries opened w/ dlopen or similar is -- like what happens to type information? Anything you dlopen pretty much has to be written in C or something with a reliable C ABI interface, in which case you have to dumb down you coding style to be roughly C-compatible anyway. This means you get all of the headache and none of the advantages of C++.

Personally, I think C++ is pretty useless without a garbage collector attached. If I want to do something low-level and predictable, I'd use C. If I want to do something high-level, I'd use Scheme. If I need to do something mid-level, I'd use Perl. I don't see any place where C++ shines ahead of those in the areas they are used. I really haven't found a class of problems that really shout "C++" to me. It may be better than other tools, but usually when you are using C++ on a project it just means you would have been better off using something else.

Yes, that's exactly what I did in my latest nut project, which does lots
of string processing using the C++ STL. In fact, the manually-written C++
code is right at the top level, while the low-level logic is
machine-generated.

I'm willing to believe that C++ isn't as popular as C, mainly because of
these reasons:

C++ touters like to harp on its OOP capabilities alone, even though C++
is suited for a wide range of programming paradigms.

There's still a lot of C++ code out there which uses strcmp()
to compare strings, and new int[n] to construct temporary
arrays.

All the "C++ without GC is useless", "C++ is low-level", etc. FUD.

Though I agree with johnnyb on one thing: you
need to understand the entire C++ language before you decide which parts of
it to use.

johnnyb, how does overloading affect you if you choose not to
overload functions? If you don't know how to use overloads safely then
don't overload!
How do exceptions affect you if you don't throw any (and don't cause libraries
to do so by indexing past the end of a vector, for instance) ? GCC even lets
you turn off exception handling.
Exceptions only harm memory management if you write C-style memory management,
not if you use constructs such as std::auto_ptr, ScopeGuard or the Boost
smart pointers which are explicitly designed to solve such problems.
Why choose to ignore proven solutions to ancient problems?

Casting is always a failure point, that's why C++ introduced the more
explicit static_cast<...>() etc. so it's more obvious you're doing something
questionable. If you ignore the hint and just use C-style casts (or
sledgehammer casts, as I call them) that's not the language's fault.

You don't need to know about
templates to make your program much simpler and safer by using the STL.
IIRC Francis Glassborrow's book You Can Do It makes extensive
use of STL template classes and functions without once mentioning how to write
your own templates or even how to declare them - he just says something like
"vector is an incomplete type, you need to say vector<int> to use it" and
gets on with it.

I'd agree you need to know more than just C to use C++,
if you write C in C++ you'll get bitten, but you don't need
to know the entire C++ language. If you only know C then read a recent
introduction to C++ such as Accelerated C++ to learn how to use C++
safely and simply, in a considerably different style to programming in C.

Minor rant about writing C in C++ over, I'd like to mention something more
directly relevant to the article.

Eric Raymond, in The Art of UNIX Programming compares the popularity
of various languages in Open Source programming, and separates C and C++.
I don't know how accurate his figures are, as the book was in development
for 5 years, and it shows in palces (some info's a bit out of date), but you
can see it online.

C's popularity is understandable, purely based on the fact that popular systems are C based and the default interface is C. These systems naturally give preference to C, both socially and technically. If C is not suitable for general purpose programming, then why build a whole system around the language? The designers of Unix and other systems obviously thought C was a suitable language. In a fairly recent interview, Kernighan states C is still his favorite language, and he also admits "one of the reasons that C++ succeeded was precisely that it was compatible with C".

Many languages do not have the flaws of C, why then in particular is the popularity of C++ important? It should also not be forgotten that popularity is primarily a social phenomenon.

The simplicity of C was mentioned by jum in comparison to C++, not to take away from his argument, I would like to breifly mention the chapter Who Says C is Simple? in the CIL manual.

I didn't think the article would receive nearly as much attention as it has. It is pretty obvious that a "for fun" article like that has to rely on data sources that have some defects. However, I still stand by it, I think it shows interesting and maybe even useful information.

For Linux, the "default interface" for system calls consists of loading the
syscall number and parameters into registers, and then calling int
$0x80, which doesn't remotely look like anything like C calling
conventions -- in fact, it's more similar to the kind of syscall interface
that exists on MS-DOS, where no language is predominant. To get a complete C
interface on Linux you need another 1.5Mb of goop (read: glibc).

Also, OS coding and general-purpose coding are very different beasts. For
the latter, you are free to throw all the nitty-gritties of memory
management, process creation, etc. to a bunch of lower-level libraries. But
an OS often runs in a very different environment from a normal hosted
program, and the libraries suddenly become quite useless, so you need to do
a lot of gruntwork yourself anyway and C++'s extra features probably don't
help much.

I know Linux provides a syscall interface, in the case of Linux I don't think it is incorrect to say the default or common interface is C, that this is the encouraged form of interfacing the system and also the windowing system (Xlib). For example, looking up the manual page for socketcall will turn up this text:

socketcall is a common kernel entry point for the socket
system calls. call determines which socket function to
invoke. args points to a block containing the actual
arguments, which are passed through to the appropriate
call.

User programs should call the appropriate functions by
their usual names. Only standard library implementors and
kernel hackers need to know about socketcall.

It should also not be forgotten that
popularity is primarily a social phenomenon.

However, you seem to say that as though it's a bad thing. davidw
makes it quite clear why he considers popularity worth considering:

If everyone is using a language and contributes a little bit back here and there (libraries, documentation, help on mailing lists), it's certainly more valuable than an equivalent language with none of this participation.

The social environment in which a language exists is important for many reasons.
Equally applicable is the environment in which an open source projects exists,
and historically the environment in which UNIX was invented/nurtured. Why
dismiss such social phenomena?

Not really a general popularity measure, but perhaps interesting nonetheless, are the results for the ICFP programming language competition: the list is naturally heavily titled towards functional languages, but C++ was this year's most popular single language [1], among over 300 submissions; other popular non-functional languages were Java, Python and C; and perl did well in the lightning submissions category.

davidw: Thanks for adding the other languages: they make the document much more valuable. You maybe doing this for fun, and drawing upon imperfect sources, but your treatment oozes professionalism and is a very worthwhile contribution. Two things that would improve the survey would be: analysis of the Sourceforge statistics, and some attempt to look at how language popularity varies by project size.

mslicker, the words on
the socketcall() manual page holds for just about any high-level
language. Should one need to know about socketcall() when
programming in Pascal, for example? Probably not. There's no need to read
some sort of C bias into these words when there's no such thing.

But it's true that for some reason, C is seen as the
über-portable language across the programming world. This is quite
strange to me, because I don't recall seeing C's "portability" being heavily
touted by its creators. (Contrast this to Java: everywhere you turn, you
hear the "Write Once, Run Anywhere" mantra.)

Well mslicker,
socketcall() isn't available as a C function from any of the
standard header files. The closest is sys_socketcall() in and
that's only usable from inside the kernel.

And, by adding a few choice words like "i.e.", you can probably `prove'
anything you want about Linux, including that Linux is x86-biased,
US-biased, rich-biased, etc.... or even that Linux is anti-Semitic. Goes to
show how much value I attach to such a method of `interpretation'.

tk, Unfortunately for you, you can't prove or disprove anything simply with choice words. For example, you can't prove or disprove anything by inventing a list of increasingly absurd claims ("Linux is x86-biased, US-biased, rich-biased, etc.... or even that Linux is anti-Semitic"). None of this detracts from the fact that Linux is C based, as the over two million lines of C code readily attest to.

You can't disprove the fact that "the manual page for socketcall documents socketcall as a C function" by saying "socketcall() isn't available as a C function from any of the standard header files". Nor does your statement disprove that C is the encouraged method of interfacing the system as a whole.

If you think my use of "i.e" is a non-sequitur, what does "appropriate functions by their usual names" refer to? If the "appropriate functions by their usual names" are not the C functions provided by the C runtime listed in the "SEE ALSO" section of the socketcall manual page, what are they? Where are the writers of the "Linux Programmer's Manual" pointing us to?

redi, Popularity is not the same thing as community. You don't need to beat C in a popularity contest to have a sustainable community. The functional programming community is not even represented in davidw's survey yet functional programming commands a respectable and sustainable following. I don't consider the reasonable statement of davidw that you quote in favor of the argument that C++ needs to be more popular than C. Stating that popularity is a social phenomenon is not the same a dismissing popularity because it is a social phenomenon. Although, in my opinion popularity contests and fads should not have nearly influence they presently have on decision making in field of computing.

"johnnyb, how does overloading affect you if you choose not to overload functions?"

It doesn't, unless you are using a library of functions that uses it. Of course, even if you know how to use it, my point was that it is way too easy to get bitten and have know idea why when the compiler chooses a different way to cast your parameters than you think it will, and just because of casts runs an entirely different set of code.

The C++ language is a mess, and if one were to do high-level programming, I have no idea why one would choose C++. The only programming paradigm that C++ supports better than other languages is Alexandrescu's Policy Classes -- even then you might be better off with Scheme, I just haven't figured out how to make macros do policy classes. What's really amusing is that C++ forces you to write recursive macro programs, while Scheme's macro facility does not. Again see Alexandrescu for more details.

In fact, I had wholly given up on C++ until I read Modern C++ Design I had entirely given up on the language as being both absurd and useless. Now I see it as having a little theoretical value while we search to find better ways of doing what Alexandrescu describes, because C++ certainly isn't worth it.

It also amazes me that garbage collection is not a default part of more C++ compilers. I don't see, given how much work C++ does to try to make memory leaks and other allocation problems, why it is not mandatory that C++ compilers at least ship with an optional garbage collector. It's not like Boehm is that hard to link with. I'd even go so far as to say maybe C++ needs to have garbage collection be the default.

Anyway, Scheme is awesome. The deeper I get into it the deeper I want to go. In how many other languages can you do ambiguous assignments, and decide _later_ what you wanted that previous assignment to be? Ahhh, the joy of continuations. For example, with a small library, I can do the following:

This snippet of code will retroactively assign a the value of 5 and b the value of 4 based on the assertion given. Since a=5,b=4 is the only value that satisfies the assertion, the return value of the above construction will be 9. This is a really nice feature for logic programming. The code for the "amb" functions is only about a page long.

Because it compiles to fast code! Remember that your program doesn't run in
an alternate universe where time and space doesn't matter (read: the Boss
Zone), it runs on a machine which exists here and now, and which does have
time and space limitations.

Theoretically, you can apply powerful optimization techniques any language
-- even Prolog -- to make them competitive with C++, but let's face it,
implementations of such techniques don't exist yet. (Last I heard,
the people behind the Self language were able to get Self running at half the
speed of C++. Yes, only half.) And even if they do exist, they'll likely
take up more memory at run time than an equivalent C++ program.

Besides being fast, C++ also provides a palette of rather high-level
facilities. They may not be the most elegant in the world, but they're
there, and -- most importantly -- they're efficient.

By the way, researchers who code computationally-intensive algorithms in
Java deserve to be rounded up, whacked in the head, and shot.

Let's not get into yet another argument about the best language, and
consider the question of why C and C++ compete in the same space, rather than
why C++-doesn't-deserve-to-be-where-it-is-because-$language-beats-it-hands-down.
As measured by davidw and ESR, C++ is a lot more popular than
many higher-level, "better" languages. The issue is not why is that
situation ludicrous, but why is that situation the one we're in.

Maybe a problem with much C++ code is a consequence
of the language's popularity - the more popular the
language, the more likely it is for non-experts to bang out shaky designs
full of "clever" features. As ncm has observed,
[Inlines] are the third most-misused C++ feature
(after inheritance and overloading).
This is maybe a fault of the language's popularity, not of the language itself.
If another language were more popular we might complain about all the
badly-written $language code that just doesn't get it right.
Although I strongly agree that in C++
"there is no obligation with C++ to use the whole of the language"
noone seems to teach that, so the inexperienced rush in and try to
use every feature available. chalst has a good point that C's
popularity might be a precaution against misuse of C++'s higher-level features.

"C++ doesn't kill projects, bad programmers do" may well be true, but many
countries still choose to ban guns. I wonder what ESR would say to that ...

Popularity is not the same thing as community. You don't need to beat C in a popularity contest to have a sustainable community.

I disagree with that statement, popularity is very close to community as they are derived from a single source: populi. To me, if a community is behind a language, then by itself it is considered popular.

I can't remember who first told me this, but it's totally true. "There are two types of languages. There's the kind people complain about, and the kind they don't use."

People complain about C++. Like crazy. But it gets the job done, no matter how painful it was. For almost any other language, not every job is even possible, let alone painless. For example, try to write a fast program in Java or python.

It's not impossible to have a better language than C++ and still have good or even better performance. All of these languages (except maybe Sisal, which I don't know much about) have better, higher-level facilities than C++. And, as you can see here, they can be competitive or even faster. And there is still quite a bit of room left for optimization.

The problem is that the companies who develop and promote tools really do a piss-poor job. They give us languages that are 20 years outdated when they arrive, and then load us up with tools and documentation to try to make up for their deficiencies, and we thank them for it. What we really need to do is learn these kinds of languages, and demand that our tool vendors ship these languages or languages like them.

Another data point on Scheme-vs-C++ is that Scheme allows quite a bit of compile-time programming -- allowing you to do a lot of computation that would otherwise have had to be done at program startup in the C++ program.

...but in order to get the competitive performance, you pretty much have to
forgo the high-level features which make it attractive in the first place.
For example, the "word frequency" program in C++ is ~4 screenfuls, while the
corresponding MLton program is ~16 screenfuls. Where's the "high-level"
advantage again?

In addition, doing complex precomputations at compile time isn't an alien
concept in C/C++ circles. The lex and yacc programs are
prime examples of this: they precompute state machine tables instead of
forcing your application to compute them at run time. The precomputation
doesn't even need to be done in C or C++: for my nut project, I used a
3rd-party Prolog program to generate C++ tables. (The only "advantage" of
Scheme is that it does these precomputations in the framework of the
language itself, while for C or C++ you need a separate mechanism.)

In the "word frequency" program, the SML program must implement a Hash table, quicksort, and for some reason insertion sort. The OCaml (in the same family as SML) implementation comes right behind C++ in that benchmark, and is only 29 lines in comparison with 79 lines for C++

tk: I give MLton credit for letting you write efficient code, and letting you have high-level language facilities. If it takes 4x as many lines of code to do it efficiently in MLton as in C++ (I haven't confirmed this; I'll just believe you) then that's pretty sucky, because C++ certainly isn't awe-inspiring in its brevity. But at least you can.

This is also why people use C++. Most people hate C++; most people (although the sets don't overlap completely) also use C++. This is because C++ can do whatever you want. It can be fast (as fast as C; I don't know why the shootout gives it a lower score), reliable, or reasonably expressive. You can't really have them all at once in C++, but no other language gives you that either.

People keep coming back to functional languages like scheme in arguments like this. I really like the looks of scheme; you could say it already has all the features of all the other languages - other than syntax - and it (and lisp) had them before everyone else. But I tried all the scheme interpreters and compilers in Debian, and they all used (much) more RAM and ran more slowly than the equivalent C++ program, and most of the interpreters were crash-prone. (A crash-prone interpreter? Give me a break!) The existence of bad interpreters is probably not the fault of scheme, but it is probably one reason (along with weird/missing syntax) that scheme isn't very popular.

Because it so clearly is, perhaps. The Linux Standard Base is defined in terms of shared libraries and the ABIs they support; the only machine-readable form of most of the corresponding APIs are C headers, so you at least need enough of a C-compatible language to parse function prototypes, structs, unions, #defines and inline functions.

How many non-C languages are available on Linux that talk to the kernel directly instead of calling functions in libc and similar libraries? I don't know of many.

(Whether, in practice, the libc interfaces are more or less stable than calling the kernel directly would be is a contentious issue. I've certainly been burnt by minor (even patchlevel, once) changes in libc breaking my apps which called not-quite-supported stuff)

I don't suppose that anyone would argue that if one were to design a high-level language with a completely free hand, one would think one had done a good job by coming up with C++. However C++ does represent a certain kind of language design achievement, once one bears a few factors in mind:

C++ was designed to maintain a very high-degree of source-level compatibility with C;

Because the C++ standard documents an implementation, rather than an independent language design effort, a lot of constructs are in it that have not stood the test of time;

C++ does, as far as possible, respect the principle of runtimelessness, that is, the language does not force run-time overheads upon the programmer unless they are asked for.

This last factor, the idea of runtimelessness, is a very valuable idea, and one that I regret has not been much taken up by the programming language design community. There are many application domains for which garbage collection is either a mixed blessing, or is not practical, and C++ is close to being a category killer for these. There are alternatives: I like the tick-C and prescheme languages, but the development environments are not under active development, while Objective-C wins fans, though I think it has clear expressive defects compared with C++.

This last factor, the idea of runtimelessness, is a very valuable idea

The typical C environment is not runtimeless at all; it's just that the runtime environment is (unsurprisingly) a close match for the unix runtime environment, so you tend not to notice it's there. What happens when you only have 64Mb of free RAM, and you write to an array 96Mb in size? The runtime transparently shuffles some pages to disk to free up others so that your program can continue. It happens that this runtime support, on a mainstream unix box, at least, is provided once for all processes in the kernel, instead of per-process, but it's still there and it's still throwing occasional "that operation took much longer than it ought to have done" spanners in your execution model.

If you'd said that "a good fit for some popular runtime environment" is a major factor in C/C++'s success, I'd have agreed - though as C and unix have grown up together it's not too surprising that they're well-suited for each other - but to claim it has no special runtime requirements is like a human claiming it has no special respiratory requirements.

dan: The typical C environment is not runtimeless at all; it's just that the runtime environment is (unsurprisingly) a close match for the unix runtime environment.

This is a good point, especially the bit about "that operation took much longer than it ought to have done" spanners; note I hedged by saying of C++ it "as far as possible" respects this principle. I think, though, that the UNIX model can be seen as clinging very close to the contours of the underlying mini/PC architecture, much closer than one might a priori imagine is consistent with the tolerable usability it achieves. The UNIX runtime environment is rather more fundamental than some popular runtime environment might suggest.

None of this is to say that there might not be better or more fundamental models than the C/C++/UNIX model; rather, the point of my post was to point out that the current trends in programming languages research don't seem to be particularly helpful at revealing them, and presuming GC is particularly vision impairing in this respect.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser
code is live. It needs further work but already handles most
markup better than the original parser.