C++ is the main language of game development. This is changing slowly as indies embrace other languages, but in the AAA space C++ is still overwhelmingly dominant. C++ is descended from – and is very similar to – the language C. First created in 1972, C is just one year younger than I am. It was devised for the world of the 1970s. It was targeted at the hardware of the 1970s, and was originally intended for writing operating systems.

This seems crazy, doesn’t it? Writing operating systems for Nixon-era mainframes is so vastly different from building AAA games in 2019 that it’s like we’re using coal-fired steam engines to go to the moon. Sure, the steam engine has been modernized a bit, but there are still conventions built into the language that don’t make a lot of sense in the world of 2019. The fact remains that somewhere underneath all those rocket engines and silver wings is a chugging steam engine.

C++ certainly has language features not available in C. C++ has classes, inheritance, operator overloading, and a bunch of other slick ways of expressing complex solutions in code. Those are nice, but none of those things uniquely address challenges faced in games programming. We could, in an alternate universe, use a different sort of language with a different set of features.

It’s not like this industry is incapable of evolution! Studios have changed game engines, and game engines have changed what graphics API they favorOn the PC side, this boils down to DirectX vs. OpenGL, with third-party candidate Vulkan landing a few recent wins.. Our tools are different, the target hardware is different, the operating systems are different, and the performance challenges have changed numerous times. Rendering technology has gone through at least two major revolutions. First there was the jump from software rendering to using dedicated graphics hardware, and then another jump when we added the ability to program that graphics hardware using shaders. Over the last 30 years we’ve changed every single thing about game development except the language!

So Why are we still using this language?

Sure, I can take this thing to the moon... as long as you build me a track between here and there.

Lots of libraries. There are tons and tons of C++ toolkits, libraries, and code snippets floating around in the wild. Do you need a sound library? Font loading? Access to rendering hardware? Support for gaming controllers? You don’t need to write that code yourself, because there’s a really good chance that someone else has already solved the problem and provided the source code for free. Of course, adding their code to your project is often a lot harder than it ought to be, but spending six hours pulling out your hair playing “dependency scavenger hunt” is faster than writing everything from scratch, even if it is a dumb miserable way to spend an evening.

Lots of programmers. Since C++ is the big important language, everyone learns it. Which makes it easy to hire people to work on your project.

Lots of help. Yes, answers to forum questions often take the form of abusive condescension and nerd peacocking, but at least a C++ programmer can get their question answered after their tag-team humiliation. If you’re using one of the more obscure languages, then you might not get any answer at allYou’ll still get mocked, though. Mostly by jackasses asking, “Why didn’t you use C?”.

No dominant alternative. It would be one thing if there was another language out there to play Pepsi to C++ Coke, or could be the Apple to the C++ Windows. But there’s no clear contender. Java is good for some tasks, Python is good for others, but none of the challengers works as a broad general-purpose language. And that’s fine. There’s lot of value in specialization. But that focus helps drive the C++ feedback loop of ubiquity.

I’m still confident that’s all true, but after four years I’d like to argue with my past self and suggest that this industry inertia can’t be the full reason for why C++ is so deeply entrenched.

The Lie of Simplicity

This image will make sense in about three paragraphs.

PC hardware is usually presented as a processor and a pile of memory. When a program is run, the processor makes changes to the contents of the memory, and you get some sort of output. On this site, I describe programs this way all the time. Sadly, this is a gross over-simplification. To understand why C++ is still dominant, we need to look at how the hardware is really constructed and what it’s really doing.

What’s actually going on inside of that humming box is that you’ve got a whole bunch of processors all bundled together in a single CPU housing. We call these separate processors “cores”. Those cores don’t make changes to memory directly. Instead, blocks of memory must be copied to a smaller pool of memory called the L2 cache. From there it’s copied to an even smaller pool of memory called the L1 cache. This L1 cache is actually inside that CPU housing with the cores. This is the only memory that the cores can manipulate directly. If the processor makes some changes to memory, then the altered block is copied back out through the layers and is stored in main memory.

Let’s say you’re a core. The L1 cache is your tiny workbench right in front of you. You can examine bits of memory on the bench. You can compare them, perform arithmetic, and you can make changes to the contents of the memory. Sometimes you need a chunk of memory that you don’t see in front of you. When this happens, then hopefully what you need is stored in the little shed just outside, which is your L2 cache. If the required item isn’t stored there, then you need to jump in your forklift and trundle all the way to the other side of the campus. You need to drive all the way to the particular warehouse that has the stuff you need. Go inside, find the pallet that holds the data you’re looking for, and drive it all the way back to your shed. Then take the items you need off the pallet and put them on your workbench so you can get back to work.

This is just the tip of the iceberg. There’s also some parallelism that you can take advantage of if you understand the hardware. A single core can handle multiple operations at the same time, provided you structure your operations properly. If you have the variables A and B and you need to modify them independently of each other, then it’s much faster to order your code so that you change A, then B, then A again, then B again. If you simply modify A twice and followed by B twice, then you’ll miss out on some of the potential performance gains.

Disclaimer: I’ve never done any real programming at this level. Above is how it’s been described to me by people with more knowledge of programming close to the metal. Even the above is a pretty big simplification of what’s going on, but we’re basically at the edge of my knowledge at this point and I hesitate to add more.

Of course, maybe the compiler will help you out and re-order your operations. Maybe it won’t. Do you know how to make sure it does the right thing? Do you know how to check? Do you know what kind of performance gains you’re chasing and if it’s worth your time?

This thing where you need to drive to the warehouse is called a “cache miss”, and it can have an immense impact on performance. If you want more detail, this article has some great information on when you’ll run into a cache miss and should be approachable to non-coders. I’ve never been able to find any hard numbers on the overall cost of a cache miss, but developer Mike ActonFormerly of Insomniac games. Currently working at Unity. throws around the figure of “200 processor cycles”. That’s on the PlayStation 4 hardware, but I’m willing to bet that’s in the same ballpark as the rival consoles and the PC. 200 processor cycles is crazy expensive, and means a cache miss is one of the most expensive things that can happen to your program.

And it’s completely invisible in the code!

Can’t the Compiler do it for Me?

Can the compiler optimize my code for me? And if so, can it just write the code for me? This job is hard.

I’ll be honest, I don’t enjoy messing with this stuff. I like thinking about the hardware as a simplistic CPU and a magical pool of memory. Worrying about the size of the L1 cache and the fetch timing is when programming stops being fun and starts feeling like accounting. It’s hard and annoying and adds an unwelcome layer of complexity to code. I tend to think of this business with cache limits as “intrusive” and I’d rather let the compiler handle it for me. In fact, during my Good Robot series you can see me advocating a linked list without giving any thought to how every single entry is likely to trigger a cache missTo be fair, I was currently being distracted by something even slower.. That’s silly, and blunders like that would get me bounced out the door of a serious AAA studios engine teamUnless it was Bethesda Softworks, where they’d probably put me in charge of engine optimization for Gamebryo..

I’m not the only one averse to thinking about the actual physical limitations of the hardware. If you poke around you’ll see coders being pennywise and pound foolish with processor cycles. You’ll see advice like:

If you don’t care about precision, then you could cast this float to an int before this operation because that will be faster.

Rather than doing this comparison 6 times in a row, you should do it once and store the result in a bool. It’ll be faster!

In the right context, all of these might be reasonable advice. But often coders will obsess over this stuff when their real performance problems are coming from their failure to manage their memory. I am tremendously guilty in this area, and I am not an outlier by any measure.

Getting Back to C++

Pictures of code are boring, so here's a picture of raw materials before they're converted into code.

I think stuff like this is why C++ is so dominant. Newer languages act like they want to protect you from having to think about the hardware. Don’t use pointers, they’re dangerous. Don’t worry about the layout of data in memory or how big it might be. Just trust Friend Compiler to handle it for you. Don’t worry about the cost of allocating memory or memory fragmentation.

As someone who hates worrying about the hardware, I really appreciate this. In the overwhelming majority of use cases, the programmer should not need to waste their time obsessing over minuscule little 64 kilobyte chunks of memory like it’s 1988. Computers have tons of power these days and programmer hours are not cheap. Hiding the cache behind layers of abstraction makes economic sense.

It’s great to be shielded from all of that terrifying complexity, unless it happens to be your job to worry about hardware. Most areas of the code don’t need to think about optimizing the levels of cache usage, but if you do need to worry about it, then you really, really need to worry about it. If I’ve got 5 space marine objects and 6 space bugs in the scene and those are the only active objects in the game, then I do things the easy way. But if I’ve got ten thousand particles, two hundred bullets, 2,048 map zones, five hundred physics objects, and a hundred enemies in the scene at the same time, then I really need to think about how the data is being processed. If I’ve got enormous objects in memory – like texture maps or large collections of polygons – then I need to think about how often that data is being manipulated, copied, changed, and compared. If I’ve got 10,000 particles flying around the scene or I’m doing physics collisions between a lot of different objects, then doing things the Right Way™ in memory can make the difference between running the game gracefully and completely tanking the framerate.

I think this is why a lot of the newer languages haven’t gained much traction in the deep end of AAA gamedev. They make life easier for the 90% of the job where you’re doing straightforward things that aren’t serious performance concerns, but they leave you helpless when you come up against that last 10% of the job where you need direct control over where and how things are placed in memory.

A few of the upstart languages do have these features. In particular, D and Rust both seem to have a lot of supporters who claim the languages are just fine for high-end gamedev. Other people claim they don’t offer enough, or in the right way. I’m not nearly qualified to weigh in on that argument. I’ve read about both languages, but trying to learn a language by reading about it is like learning to drive by watching Top Gear. The learning is in the doing, and I haven’t done enough with these languages to offer any meaningful analysis.

Also, Rust has been “nearly ready” for game development for years now. I don’t know what the holdup is, but I suspect the problem isn’t that Rust is just too suitable for gamedev.

Still, the point remains that any language intended to surpass C++ in the realm of games is going to need to match or exceed C++ in its ability to optimize very small but important things.

Footnotes:

[1] On the PC side, this boils down to DirectX vs. OpenGL, with third-party candidate Vulkan landing a few recent wins.

[2] You’ll still get mocked, though. Mostly by jackasses asking, “Why didn’t you use C?”

[3] Formerly of Insomniac games. Currently working at Unity.

[4] To be fair, I was currently being distracted by something even slower.

[5] Unless it was Bethesda Softworks, where they’d probably put me in charge of engine optimization for Gamebryo.

151 thoughts on “Game Programming Vexations Part 3: The Dominance of C++”

And C++’s attitude is an expansion of C’s, where you could do inline assembly if you wanted to. The general idea behind C and C++ has always been that you can abstract away from the complexities if you want to and yet still have more direct access and management if you need to. This means that C and C++ don’t always do everything for you, but trade that off with you being able to do really complicated things if you need to. As a general purpose language, that’s a HUGE benefit. Perhaps not so much if you need something very specific, though, which leads to the popularity in some areas for Python, Java, and other languages (Prolog was huge for some AI programming).

But I think you’re on to something with the lack of uptake of D and Rust: any contender, even in a specific area, is going to have to do all the things that C/C++ does that that specific area needs. That’s hard to achieve with a new language. Otherwise, it will have to do something so radically different and useful that it’d be better to use it and write things in from scratch that stick with the general purpose language (I think Java and Python both got their boosts from that sort of thing).

Or, given my experience, it’ll have to find some way to sound really, really cool and new, and get the gurus pushing for that. But AAA gaming doesn’t strike me as an area where that happens that often …

Python mostly gets it’s boost from its “newb-friendly” syntax. As a result its more likely to be picked up by academics with less computer-science proficiency and massive libraries are written for mathematics, statistics, AI research, biology (bioinformatics), astronomy etc.

Python also gets a lot from its ubiquity on linux/unix systems. Perl for the same reason, but perl is “ugly” while python is “beautiful”.

I’ve seen that (I first learned it in a Cognitive Science course where I was the only one other than the professor with any real programming knowledge, and as I had been working in industry for almost 20 years I even had more than him). What I liked about the language myself, though, was that it’s interpreted rather than compiled, which makes doing small little projects easier: if you have the source code, you just need to load it into the interpreter and you can run it anywhere, anytime.

I’m getting my PhD in astrophysics, and my job is basically writing Python code all day most days. I’ve never taken a computer science course, I just picked it up on the job as a student research assistant back in undergrad, that’s how easy it is to learn.

The “batteries included” philosophy of Python is certainly great, and one reason I use it; for just about any task out there, someone’s probably written a package to handle it. And with package managers like Conda, installing a new package is as simple typing “conda install [package-x]” in the terminal, so generally no worrying about dependencies; they usually Just Work™.

Why are we all still speaking English? Development of modern English was completed was complete about 500 years ago, haven’t we learned anything better since then? No, instead we’ve tarted up the language with some new grammar and keywords occasionally, and it’s still perfectly functional.

C’s raison d’etre is to be a more portable representation of how code is structured on a computer with a von Neumann architecture; most of its commands and structure can be translated directly to the equivalent machine code. We’re still using von Neumann architectures today, so that’s hard to beat. C++ adds more features, but on the basis that of you don’t use them, they don’t cost anything; I’d argue that’s the root cause of its many, many flaws. You can abstract those problems away with other languages, and if you can afford the price in slowness, then you really should – it saves a lot of C++ heartache.

But if you’re competing with C++, then you can’t create a language with more run-time checks (slow!), so it can only include more static checking, compile-time checking, and allowing you to access the power of the hardware with less inscrutable grammar than C++ uses for the same. C++11 and onwards already sorts out most of the badness in earlier C++, and I’ve not seen any substantial suggestions of how eg. a custom memory allocator, both as fast and as customisable as a C++ one, would look any better in another language, even ones which allow it.

Wow, that’s one hell of a false equivalence. A videogames studio learning to use a new programming language is nothing like an entire country switching to another language. It’s too bad you started with such a blatant fallacy because the rest of your comment makes sense to me.

I think conversational languages and programming languages might be similar enough for this comparison. They’ve both got many people using them, lots of books/documentation/things to update if we ever wanted to switch, and lots of idiosyncrasies to work through, because there’s centuries of rules-changing instead of unified systems. In fact, I think using English isn’t a good counterpoint to Shamus’ arguments, because English itself is a mess that should be cleaned up. It’s a massive hassle for people to learn, for native speakers, second-language people, children and adults.

Who formatted that website? Who looked at that and thought, “Yep, that’s the one. Ship it!”? The defaults would have been orders of magnitude better, but no, they spent effort to make it unreadable. It burns the eyes. Thank god the console is there to fix it…

And reading the post, I disagree with a few of the points. Particularly ‘F’ and ‘Ph’ are subtly different sounds, at least in received pronounciation.

I suspect that f and ph actually do sound the same and the difference is in your head – you know when to expect an f and when to expect a ph and you hear it “differently” despite them being the same.

I wonder if someone has done a double blind test on this – ask someone to say “phase” or “faze” without context and see if people can tell which it is. Though actually you probably want to use fake words for this, because saying those I feel like I pronounce the end differently and the start the same.

People’s perceptions of what they are saying and hearing do not always line up with what they’re actually saying and hearing because the brain does a whole lot of error correction and data massaging unconsciously.

That said, you are right that that site is hideous and I don’t agree with all of their points – for example, I feel like plurals are useful. On the other hand, “weird spellings” is very much a thing and is terrible. Tough, through, thorough, though. Awful.

Oh, I’d also want to get rid of gender as a grammatical feature. If you want to say what gender someone is then say it directly like you’d cover any other quality. We could get there by adopting the singular they (and it’s the best option currently available), but that goes right back to my thoughts that plurals are useful.

On the ‘ph’ vs ‘f’ sounds, they are different. It is not in people’s head. Same with ‘ph’ = ‘v’ sounds.

I accidentally discovered it as child while recording my name (Stephen) backwards and playing it back. “Nehp-ets” requires a hard popping ‘p’ to get a ‘v’ sound when played in reverse. ‘F’ sounds required distinctly different vocalization than ‘ph’.

Don’t take my word for it. (heh. unintentional pun) Try it for yourself. Pronounce “sheer_falacy” & “ycalaf_reehs”. Record both, and play it back in reverse. Also try it with “fallacy”. Keep doing it until it sounds the same both ways. I bet the ‘f’ will sound like an ‘f’. Find other words with ‘ph’ in the middle and try those. (“Phase” is a bad example. Words that start and end with extra letters don’t demonstrate anything. It is a bit like trying it with “ye olde.” You are not checking spelling, you’re checking pronunciation.)

This is, of course, all dependent on dialect. (Which you know, since you mentioned this is in RP, but just to emphasize the point.) If you consider that only about 5% of England speaks RP, and that English is also spoken in the rest of the UK, in the Commonwealth countries, in the US, and in various other places, this isn’t a great example of the spelling’s usefulness.

Really, it’s redundant anyway. “Ph” is part of that borrowing point: almost all words with a “ph” come from Greek, where the letter “?” could be either a spirantized “p” (basically an “f”) or an aspirated “p” (literally a “p” and an “h” sound), depending on dialect. Also, English does use a following “h” to mark spirantization, as in “th” or (some uses of) “ch” (exceptions are loan words like “loch” or “chutzpah”) so it’s not like the rule is inconsistent; most people just don’t know how to explain the rule properly. It’s just more complicated because an unusual level of etymological knowledge is needed to understand the rules of English.

The bits about plurals and prefixes make me think the toxic waste website may be taking the piss, however. Most languages have plurals (some even have duals… Speaking of ancient Greek…), and, while English does have unusually many irregular plurals (etymology again), they complain about the standard “[e]s” ending. And, again, many languages have prefixes. Some have infixes, which I think make looking things up harder, since you can’t tell what’s an unrelated word and what’s a word with the middle changed without a native-level grasp of morpheme boundaries. So, either that whole thing is a joke of some sort or the author’s actual native language is not Ukrainian but something lacking both plurals and prefixes.

Also, can I just say that learning to say things backwards sounds much harder than just learning the IPA and a little history? Bravo, Steve C, for managing it, as well as finding a use for it beyond party tricks!

What you’re saying here makes very little sense. Are you saying you pronounce “Stephen” and “Steven” differently? For me, and I’m pretty sure for most English speakers, they’re identical; for any given person named /?sti?vn?/ I don’t know how it’s spelt unless I look it up, so I couldn’t possibly be pronouncing it differently.

Pronouncing a word backwards means reversing the order of the phonemes, not the letters. [?sti?vn?] backwards would be somethingl like [n??vi?ts]. It’s not really clear exactly what you were doing with your childhood recordings, but it sounds like you were trying to find a way to pronounce the sequence of letters “nehp-ets” as something that would sound like “Stephen” when played backwards. But the “ph” in “Stephen” is a digraph representing a single phoneme, which you admit is a “‘v’ sound”, so why were you even trying to make a “hard popping ‘p'”? If you know it’s a /v/, why would you pronounce that “hp” any differently than a “v” if you were doing the word “even”?

Ultimately, I don’t see what trying to pronounce words backwards is supposed to establish. I can either distinguish between the way I say “Stephen” and “Steven”, or I can’t. If I can, there’s no need to mess around trying to say them backwards. If I can’t, then saying them backwards isn’t going to help; once I’ve said “Steven” backwards I already have something indistinguishable from saying “Stephen” backwards.

“Phase” is a bad example. Words that start and end with extra letters don’t demonstrate anything. It is a bit like trying it with “ye olde.” You are not checking spelling, you’re checking pronunciation.

What is this even supposed to mean? What “extra letters” are you talking about (the silent “e”?), and why do they make it a “bad example”? Apart from the “f”/”ph” distinction, the spelling of a word is irrelevant. Or are you saying your supposed “f/”ph” contrast doesn’t apply at the beginning of a word?

I understand your incredulity and doubt. It absolutely is a thing. I pronounce “Stephen” and “Steven” the same. The difference comes between “nehpets” and “nevets” played in reverse. Which intuitively shouldn’t make any difference. Yet does.

Playing it reverse and listening to it carefully and trying to replicate that noise is difficult. So you end up overemphasizing the wrong parts and under-emphasizing others until getting it right. I can get “neH-Pets” to sound like “Steven”. A strong H and a hard popping P is necessary to get a “v” sound when reversed. Don’t ask me why. Makes no sense intuitively. That’s just how it is. Where as any pronunciation of “nevets” sounds like the first 2 ‘e’ in September. Except are no short ‘e’ sounds in “Steve”. However I can get “evets” to sound more like “Steve” than say “hpets”.

A good example vs a bad example: It is the syllables in the word. A sound has to flow from one to another in your mouth. Phase vs Faze has the Ph and F at difference at the end. Your mouth isn’t lining up to make another noise. So I suspect that it won’t make much difference. Never tried it so I’m guessing. Possibly “ezaf” vs “esahp” might be more interesting than I expect.

It is letter combinations said in reverse that make this interesting. Not letters on their own. So quirkiness at the very start or end of a word won’t make much difference. “Ye Olde” has such weirdness at both ends. That is what I was referring to. Also that the letter ‘y’ in ‘ye olde’ is not a ‘Y’ but the non-existent letter “thorn”. That kind of ‘Y’ was never pronounced like in “Yay!” “Ye Olde” was always pronounced “The Old” with a ‘th’ sound.

It’s not useful in any way that I can think of. It’s just a “Huh. Neat.”

It would have been more obvious if I’d linked on the “In fact, I think”, but the URL is a person’s name (mine) hosted on GitHub (markdown auto-compiled into HTML, plus the CSS I wrote). The color scheme is 1) an homage to green-screen computers, and 2) actually easier to read for me. I’m a bit photo-sensitive, so the default bright-white background plus black skinny text of most software / websites, actually strains my eyes to look at for extended periods of time. Bolded text usually helps a bit more than color schemes, but both are useful.

Talking about linguistics and programming languages always reminds me of two people: Larry Wall, creator of Perl, linguist and programmer (hacker in the original meaning of the word), and r0ml Lefkowitz, philosopher and programmer whose talk “Literacy: The Shift from Reading to Writing” opened my mind about how writing code is like writing language.

But linguistics and IT meet at a weird philosophical point that falls out of the scope of what Shamus is talking about, so I won’t go too far into it, just mentioning it as a point of interest.

Wow, that person is clearly out of the loop. If they want to fix modern English then they need to go back to Germanic, Gaelic and French Norman to sort it all out. I mean, some of their complaints fall at those languages feet – namely the interstitial period between the futhark and futhorc.

Okay, we had the great vowel shift in the 16th century or so but, really… that’s an endearing aspect to our language. In many ways, English is more easily understandable and less complex than most other languages in the world today (and I’ve learnt French, Spanish and Maltese) and this person wants to simplify it more? Come on… start where the iron is hot, not where it’s already partially cooled.

[edit] Not to mention they’re wrong about the fish/bible example. The rule (as far as I understand it) is based on the pronunciation of the following letters, not if they’re consonants or vowels. i.e. Bible is pronounced with a consonant and a vowel after the “i” whereas fish is pronounced with two consonants after “i”.

Vowel sounds: Even supposedly phonetic languages such as Maltese do not actually follow a phonetic pronunciation because humans are lazy and will find shortcuts (which, is where most of these idiosyncrasies in the English language pop up from). For example: “il-waqfa li-jmiss” shortens the double vowel sound that you ‘should’ pronounce in li-jmiss to ‘lee-miss’. If you pronounced jmiss by itself the sound would be entirely different…

Weird spellings: Are you kidding? Most other languages that insta-copy English modern nouns really twist themselves around in convoluted circles trying to get the sounding in their native tongue.

I didn’t make it obvious enough, but I wrote that article myself. You say that all other languages are harder – I have only learned some Ukrainian, but from that and the conversations I’ve had with others, pretty much every language is full of rules-bending idiosyncracies. That’s to be expected, since most languages have evolved, but not many have been purposefully changed. The few that have, as cited in this conversation thread, have been reworked in a heavy-handed way, in an empire or other similar socio-political structure. As for other languages “twisting themselves in circles”, that’s because they aren’t adding new letters to their language, to accomodate the English loan-words. The IPA is supposed to have letters for every sound a human can make – if English used that, I think it would be a net positive, despite the transition costs.

I think it’s more correct to say that English orthography (its writing system) is a mess; purely orally, it works about as well for communication as pretty much every other natural language out there.

Also, if you want to learn a language where nouns don’t inflect between singular and plural, you might check out Hawaiian. :) On the other hand it has singular, plural, and dual forms of pronouns, so… (And the third-person non-singular pronouns can be “inclusive” or “exclusive”, so there are four different ways to say “we”!)

I know how tempting it is to make an analogy between coding languages and real languages, but it just flat-out doesn’t work. Do you want to learn German for reasons OTHER than to speak with Germans, or to brag about it? Probably not. And neither can I! Same goes for Japanese. And French. And Spanish, too. And Chinese! And from there you get the idea.

Sorry for the minor rant. Real languages and coding languages are two different things, so it can get pretty irritating when somebody tries to make an analogy with one to explain the other.

I’ve learned Japanese for a different purpose- namely, to play video games that never got translated to English. I have exactly two games that are in Japanese, and I’m not fluent so I haven’t bothered to translate to figure out the plot or anything.

This is a more apt comparison than you know, because there HAVE been attempts to create new universal languages that are easier to learn and use than English (I believe that was the main push being Esperanto, which I know about mainly from Red Dwarf). English is an incredibly difficult language to learn because it has so many exceptions. Other languages may be more complex with issues around genders and alignments, but for the most part they have rules they stick to and English is not as good at that. So why is English so popular? Well, it’s spoken so much around the world and has been used in the business world for so long that if you try to use another language you’ll end up having most people not know the language and so you’ll have a hard time, say, hiring people or working with people without knowing English. So it maintains its position mostly due to inertia. Like C/C++.

I think there are a couple of reasons for the popularity of English. The main one is probably the reign of the British Empire which was not only long, but encompassed an enormous part of the world.
The other reason, I think, is its relative simplicity compared to most other languages. Because, compared to languages like Spanish, English doesn’t have that many exceptions. Most of the words in English are two syllables long, and the majority of words are spelt exactly the way they sound, unlike French where basically NO word is spelt the way it sounds (although there may be exceptions – I myself have never had the courage to start learning it).
And while there is something to be said about having few exceptions, sometimes exceptions make sense, and prevent you from having to create rules to encompass the things that would otherwise be handled by exceptions. My German is very rusty (I only studied it for three years during Primary school), but it’s a language that has a very small number of exceptions. However, it also has a dizzying number of rules, and some very complex grammar, some of which includes… some interesting design decisions (dear Germans, what on EARTH possessed you to create a frequently used tense that puts the verb at the END of the sentence!?!? Did you do it, because it looks fun or do you just hate translators THAT much?).
Add to these perks the lack of genders and alignments, which you mentioned, and you get the recipe for quite a snappy lingua franca.
As for better alternatives, I really like the idea behind Chinese – it essentially has no grammar, since there are no tenses, no plurals, and no conjugation. The only problem is the sheer number of characters you need to learn in order to have even a basic conversation, and the fact that each of them can mean a different thing based on tone.
I don’t know enough about Japanese, but I think it’s actually solved some of Chinese’s complexity problems. But that’s kinda out of my depth – I’ve just started learning Chinese, and I don’t think I can even be clasified as a beginner, so I probably shouldn’t rush into making comparisons :D

My comments on English being hard to learn comes from discussions with people who had to learn it as a second language, although most of them were Chinese so that might give them a unique perspective. But the tradeoff always does seem to be that in most other languages once you learn the rules you can pretty much just apply the rules and be correct while that’s not usually the case in English, but you have to learn a lot of rules to be able to speak the language while in English you don’t really have to, especially given that a lot of times you can get reasonably correct English statements with a number of different but similar formats. For example, where you put a word in an English statement isn’t usually an issue, but it matters, as you noted, in other languages because it ends up applying a different rule and so a different meaning. But without the rules it’s harder to formulate a correct sentence because you don’t really know where each thing has to go.

I guess I also have a unique perspective, because my mother tongue is Bulgarian – like all Slavic languages it’s notoriously difficult to learn for foreigners. Compared to that, English grammar has always seemed like a breeze. However, I’ve seen a lot of people struggle with the aspects you just described, I have a lot of my classmates from school and university who have awful English. Of course, I think that has more to do with the way language is taught, but that’s a very long rant I don’t think people want to hear.
As for your Chinese friends, I can definitely see them having great difficulties with English. Just by itself, a foreign language isn’t easy to learn, even if it’s from the same family as your mother tongue. However, that difficulty jumps tenfold if the foreign language comes from the opposite side of the globe – English and Chinese are constructed on a fundamentally different basis, and you have to switch your entire way of thinking in order to go from one to the other. So not only did they have to deal with the usual complexity of learning a foreign language, they had to deal with the complexity of looking at and thinking about speech in a radically different way to what they were used to.

I learned some Russian in my first year of university and didn’t find it that difficult compared to other languages, like French in school — in Canada you learn French starting from grade school — or Latin in university. Then again, my background is Polish and there are some similarities there (although I don’t speak Polish). And on top of that, my Chinese co-worker was actually learning Russian at one point. I don’t think he found it that much more difficult than English, although he did at one point comment that it was different.

Of course, sometimes word order in English suddenly matters for no discernible reason. I once had to break the bad news to a non-native speaker about the difference between the city throwing an “alcohol free” NYE party downtown, and the city throwing a “free alcohol” NYE party downtown…(he was also new enough to the USA to not have the cultural context to realize that only one of those was a remotely plausible thing for an American city to do).

Even that is more about connotation than grammar – if phrased it as, say “all alcohol free”, then your friend’s interpretation suddenly seems a lot more likely, despite grammatically being basically the same. It’s just that this particular word phrase has gained a specific meaning due to common usage.

As an italian who has spent his life living in english envoironments allow me to refute your beleifs of the english language ( im’ also fluent in french)

English doesn’t have that many exceptions.

English is one of the languages i know of with the most grammar and spelling exceptions.

the majority of words are spelt exactly the way they sound.

Just, no. Letters in english can change there phonemes radically without explicit rules.
case in point:http://ncf.idallen.com/english.html
French has basic pronunciation rules because it’s been standardised like german or italian… in english if you don’t know the word it isn’t always obvous how to pronounce it.
And don’t get me started on place names! (Arkansas? Gloucester?)
At least the US spellings sometimes are closer to the phonetic pronunciation (ditch some ou’s, add some z’s) but even then… for foreigners your languqge is a nightmare of pronunciation traps.

In my experience English is a language best learnt through immersion and instinct rather than by learning the rules, because the rules are unreliable. (a thing which isn’t true of the other languages i know: spanish, german, french, italian)

IMO english is the lingua franca at the moment solely because it was the language of the most important political and economic powers in the world up until recently (and maybe still to this day). I don’t think that the relative simplicity of the language had much to do with it, but i am no linguist. Worth nonetheless remembering that when france was at it’s peak, the lingua franca (litterally french language) was french.

My opinions on chinese are superficial and skeptical, so probably useless for a worthwhile discussion.

It’s not a thing I’ve ever encountered beyond American media (funnily enough not even in the “American” schools in Bulgaria and Sudan) , that covers the Italian and French schooling systems (and a variety of international schools here and there)

Arkansas is apparently a French name for a river which was named after a local Native American tribe, so you really can’t blame that one on the English.

English pronunciation is a mess because there’s a huge number of loaner words from other languages in both dictionary English and slang. Also I strongly suspect we decided to pronounce half of them wrong just to annoy foreigners. We’re historically kind of dicks that way.

English is one of the languages i know of with the most grammar and spelling exceptions.

Compared to a lot of other languages, I really don’t think there are that many. A point in English’s favour is the fact that there are no genders and alignments. In most Slavic languages these alone carry a huge amount of exceptions, in addition to the exceptions that are there just because :D.
Maybe it’s the fact that Spanish was my second foreign language, and I’m a little bit biased against it, but I feel it has a lot more exceptions than English does (which is why I used it in my original example).

Letters in english can change there phonemes radically without explicit rules.

Which is why I said that most words are spelt the way they sound, not that all of them are.

French has basic pronunciation rules because it’s been standardized like German or Italian… in English if you don’t know the word it isn’t always obvious how to pronounce it.

I guess my initial impression of French wasn’t true. Maybe I should get around to learning it, after all…

In my experience English is a language best learnt through immersion

I wholeheartedly agree. I didn’t learn English thanks to school – I learnt it thanks to Cartoon Network, video games, and books. But that’s true for any language – just standing there learning grammar and doing exercises is the most ineffective way of learning a language, because language isn’t a knowledge in and of itself – it’s a tool. The best way of learning it is to try to use it for something – watching a movie without subtitles, living in a foreign country, etc.

IMO english is the lingua franca at the moment solely because it was the language of the most important political and economic powers in the world up until recently (and maybe still to this day). I don’t think that the relative simplicity of the language had much to do with it, but i am no linguist. Worth nonetheless remembering that when france was at it’s peak, the lingua franca (literally french language) was french.

Which is why I started with that argument in my post – it all started with the British Empire which “passed the torch”, so to speak, to the USA. When it comes to these things, politics always come first (which is why I said the British Empire was the “main” reason), but I was more interested in talking about the particularities of language (another thing worth remembering is that before French, Spanish and Portuguese were lingua francas for quite a long time).

I have to vehemently disagree as you seem to be confusing complexity and predictability, as well as grammar and phonetics of a language.

Slavic languages have complex grammar, but are usually fairly predictable. The grammar might take a lot of learning, yet the number of exceptions is relatively low. Phonetically, they are very predictable, and you can almost always be able to pronounce a word, when seeing it written (but have some uncertainty the other way).

Then, say, French pronunciation is quite hard, and there are sources of uncertainty. How to read a cluster of letters can change rapidly – but there are rules, patterns, context of the rest of the word. You might still need to take a chance on how to pronounce a word, though, unless you have really mastered it. The grammar is not that complex, but there are verb forms to remember.

Japanese in an interesting language, as it is reasonably simple and the grammar is fine – but their usage of Chinese logograms makes reading anything a massive undertaking, and even natural speaker might have trouble deciding on the correct approach, especially as the logogram sets can (and do) have multiple readings. Which is why they also have syllabic alphabets to help (but two of them, to make things more confusing again). Then the various levels of politeness embedded in language makes communication rather tricky – as even if you convey the meaning, you might do it entirely improper way.

English is entirely unpredictable as words have been absorbed from many languages, and transformed – again used rules from multiple languages. You have no chance of predicting how a word will be pronounced, even if you can pronounce a very similar words. Years of practice or being a native speaker are needed, and even then you need to be vary of regional variations. But the grammar is not too bad.
The main upside of English is, in my personal opinion, that due to cultural dominance a lot of people are at least passingly similar, and that even in broken English you can manage to convey a meaning – which is often not true in languages with more complex grammar.

TL,DR: English is a language with multitude of exceptions and unpredictable pronunciation, and that is entirely different thing than grammatical complexity. English is by no means simple.

This is actually one of the things I’ve told many co-workers, who learned English as a second language. They were used to other languages where precise knowledge of the rules was needed to be understood, but in my words, “The rules in English are all fucked up, but they usually don’t matter, and people can understand you anyways!” I think the reason for that, is that English has a lot of redundancy in spelling, sounds, grammar rules, word-order, etc, whereas other languages don’t have that redundancy, so if you mispronunciate something for example, it becomes completely unintelligible.

I can imagine your experience. I’m English and have had broad exposure to foreigners speaking that language having been in university and living in other countries that are non-English speaking.

First of all, I agree that there are many exceptions in English for spelling and pronunciation though I do not think that these outweigh the special tenses and uses in other latin-based languages such as Italian, Spanish and French. It is much easier to memorise and/or force your way through a sentence in English than in any of those languages and for it to be comprehensible to the listener.

Furthermore, this complexity is counteracted by the single biggest point of the English language – we have no dialects, only accents. As a plus, this means that we (generally – I know of the stereotype of the English person who refuses to understand a foreigner) accept other non-perfect pronunciations and spellings in the ultimate aid of understanding. i.e. as I said above, English is easy to understand even when pronounced incorrectly and stated in incorrect grammar.

I saw an example of a person confusing “alcohol free” and “free alcohol” in this thread, however, this is an easy and simplistic mistake to make in any language. What is surprising about English is that you can operate without many verbs and adjectives and still be comprehended by the listener. Other languages do not have this flexibility and, when travelling to those countries, the leaning on dialects and strict pronunciation really detracts from the second language student being able to get their point across. i.e. If you do not specifically communicate in perfect pronunciation in French, very often the person will not even understand your sentence. Same in Spanish and, I suppose the same in Italian. (Though when I speak French and Spanish, Italian speakers will often hear their own pronunciation of the common words and be able to understand me – even though I didn’t say those words specifically. My latest example was “encore” (French pronunciation) was heard as “encore” (Italian pronunciation) by a Scicilian and Italian acquaintances of mine.

English qua English has pretty clear rules, the problem is that we’ve swiped words from pretty much every other language in the world and in many cases those words use the spelling of the original language instead of being transliterated. The fairly large number of languages that use Latin lettering is likely to blame for this.

English is semi-difficult because it’s not one language, it’s three or four or more languages smoodged together. I mean, just the days of the week make this pretty obvious: Sunday (sun’s day–from Babylon by way of Rome), Monday (moon day, also from babylonian via Rome), Tuesday (Tyr’s day–Anglo-Saxon/Norse), Wednesday (Woden’s Day, Anglo-saxon/Norse), Thursday (Thor’s Day, Norse), Friday (Frigga’s day, Norse), Saturday (Saturn’s Day, Rome)

As a typical monolingual American, I’d love to argue there’s something special about English, but i don’t really think it’s true – the language is popular because the British empire dominated large swathes of the world for a few centuries, and when it contracted the US dominated popular culture. I think the shameless plasticity of English smooths it’s path to acceptance – if another language has a word that English really *doesn’t* have an equivalent of, English will happily assimilate the new word Borg-style – but Japanese is similarly flexible, and hasn’t had anywhere near the reach.

Epseranto was indeed created to be a universal language: easy to learn, easy to use (for people familiar with Western European languages, anyway) – Mark Rosenfelder described the intent as a “voiced Dewey Decimal system”. But it turns out people value things beyond precision and technical ease in their languages, and Esperanto joined Lojban and every other “universal language” in the bin of overthought curiosities. Meanwhile, in a cosmic middle finger to these projects, the most popular constructed language on the planet is a willful celebration of edge cases, exceptions, and general unruliness: Klingon.

Not. Even. REMOTELY. Close. How about read, read and red? And there, their, they’re? The vast majority of words are NOT spelt in a way that’s consistent with their pronunciation.

Personally I don’t know Spanish, but one of the main gripes I have with English is that every single rule has dozens if not hundreds of exceptions, which simply isn’t the case with the other languages I know, such as Finnish or Swedish or Russian.

Every word is spelled precisely the way it sounded… at the point in time when that word encountered writing, by the rules of writing at that point in time. The prononciation then continued to drift while the spelling was set in stone/dictionary.

I still remember when I was young, running into ‘schedule’ in a comic book without having seen it before. And there was an LP I watched semi-recently featuring a pretty hilarious attempt to pronounce ‘brassiere’ purely from sight.

In Spanish, every word is spelled exactly how it’s pronounced. It’s very nice. It even lets you know where the accented syllable is via rules or explicit marking!

On the other hand, Spanish verbs have like 10 tenses while English ones mostly have, uh, 3ish? And way fewer conjugations based on the subject. But in Spanish you mostly have rules for how a given subject and tense conjugate a verb, while in English it’s totally arbitrary. Sure, you only have to learn “run”, “runs”, “ran”, and “running” and you’re totally done conjugating “to run”, but where the heck did “ran” come from? You either have a lot of rules but they’re applied (mostly) consistently, or you have almost no rules and entirely memorization.

Claiming that Japanese is “simple” compared to quote-unquote Chinese is gut-bustingly hilarious. Spoken Japanese is relatively straightforward for a language. But get into writing, and how it has affected the language and its hilariously wild.

I say quote-unquote Chinese because there is a Chinese writing system but no single language – but rather several languages, mist significantly Mandarin and Cantonese.

“The problem with defending the purity of the English language is that English is about as pure as a cribhouse whore. We don’t just borrow words; on occasion, English has pursued other languages down alleyways to beat them unconscious and rifle their pockets for new vocabulary.”

English has so many loanwords that all the ‘rules’ are laughable. It is the language where everything’s stolen and the points don’t matter.

When people complain about the “exceptions” or “inconsistency” in the English language, what they almost always mean are the exceptions and inconsistency in English orthography. Orthography is not language. Orthography is an attempt at codifying a language into visual symbols.

The English language is not particularly complicated, nor particularly prone to exceptions. Learning the English language involves learning a quite a few rules, and quite a few exceptions, like any other language. One advantage is that English involves much less memorization of forms than basically any other common Indo-European language.

Some English nouns pluralize differently than others. Some English verbs have a different way of taking a past tense than others. But for the most part, every noun follows the same rules, and since there are only 2 noun forms, you only have to memorize a single extra form for irregular nouns. All verbs follow the same patterns for almost every combination of person, number, tense and aspect, with irregular verbs only differing in how 2 or, very rarely, 3 forms are derived.

That’s far more consistent than basically any other common Indo-European language, There are often dozens of forms for each verb, and quite a few different patterns for determining how to derive those dozens of forms. Plus there are a bunch of irregular verbs that don’t follow any of those patterns. If all you know is the base form of the verb, it may not be obvious which pattern to follow for determining the other forms.

These languages also often have a dozen forms for every noun, and several patterns for determining how to derive those dozen forms. Plus there are all the irregular nouns that don’t follow one of those patterns. If all you know is the base form of the noun, it may not be obvious which pattern to follow for determining the other forms.

Learning to speak English is relatively simple. Learning to read and write English is pointlessly complicated. Spelling is arbitrary. A single letter at the end of a word can change the pronunciation of every phoneme in the rest of the word (compare “though” and “thought”). It’s not as bad as learning to read and write Chinese. But it’s probably one of the worst possible implementations of an alphabet, sacrificing almost all of the benefits of such a system for very little gain.

The comparison with French is really interesting, especially in context of the wider discussion of programming, and the industrial revolution. The French government spent a lot more money on R&D to develop machinery and automation than the English did, yet the English pulled way ahead in the industrial revolution, in part because they adopted ad-hoc standards while all the French standards were government regulated. The flexible development process was able to work faster. The French, likewise, tend to care a lot more about the purity of their language, and even have three official bodies for regulating the varying degrees of formality of French usage. In a similar contrast, the English language has never been formally regulated and is composed almost entirely of words borrowed from other languages, and is more widely adopted. At the far end of the French example in regulation is Mandarin Chinese, which is a fiat language imposed over the entire empire, constructed to facilitate commerce under the static rule of a dynasty, and originally with only written standards, as the Chinese states did not (and many still do not) share a spoken language. And likewise, at the far end of the French example in terms of linguistic purity is Japanese, where foreign words are not even written with the same phonetic characters as “Real” Japanese.

As someone who speaks both English and French, most of French is spelled as it’s spoken… In French. You can’t take the expectations of one language into another in terms of how the pronunciation maps to spelling; all you really want is consistency, and, while French is pretty bad about that, it’s nowhere near as bad as English. Most of what look like inconsistencies in French to an outsider familiar with the language, I find come down to phonology, and unlike English, it mostly results in a unique spelling for different phones, even if they’re closely related and sound like allophones of a single phoneme to non-francophones.

This comment is really a response to multiple people. English is – precisely because of its history of beating up other languages and going through their pockets for loose grammar – a marvelously flexible means of expression. Attempts to simplify it or complaints about its arcanity miss the point. It is precisely because there are so many different ways to say things in English that the language renders us capable of expressing so many different shades of meaning, or, to put it another way, to express finer and finer distinctions. Without the language to express a fine distinction, you do not have the ability to conceive mentally of that distinction, which means your capacity for original thought is reduced. One might even say that to reduce complexity in a language is doubleplusungood.

Now, whether this description applies to C/C++, I can’t say.

(Side note: while “Darmok” is still my favorite TNG episode, for a spacefaring race like the Tamarians to speak only in literary and cultural allusions suggests that either they have a rich symbolic written language, their engineers are a separate caste with a separate language never seen by outsiders, or they’ve employed another race to do their technical work for them. I mean, when there’s a short circuit in the warp coil transfer circuit and all your chief engineer can say is “Galvani, his frog twitching,” how are you going to get anywhere?)

Inconsistent rules are not necessary, for a language to be expressive. Using the French spelling for some vowel-sounds, and Latin for others, only makes it more difficult to remember how to spell English. Save the history for the history textbooks, and refactor the language to suit its current (and likely) usages.

Let me respond to the second point first – concerning TNG. The aliens *were not* speaking gibberish to each other. They just had a clipped way of speaking heavy on literary references that the Universal Translator couldn’t manage, just as it doesn’t “translate” proper names or ideas as such. But they were using completely comprehensible words and grammar, even if the Enterprise crew were too thick to understand.

Now I chose to respond to (very good) bit about television so I can then go on to say that Language Does Not Limit thought. Language is a road that makes some thought more efficient, not a wall that locks the speaker inside. I plant my flag against the accursed Sapir and Worf (bonus Star Trek ref)!

The point of language is to easily communicate a concept, but you can understand the concept without having a word for it, and a word may be meaningless without an explanation of the concept. One can run, see people running, and even study the mechanics of it even if there was no word for it at all – and of course all human language had to be invented to describe the ideas that people came up with, things they saw or experienced, and so on.

You are correct that language does not *limit* thought (the strong saphir-whorf hypothesis). Anything you can think in one language you can think in another language. Any idea you have in one language can be communicated in another language, though it may be more difficult to do so.

But language does direct thought (the weak saphir-whorf hypothesis)… it’s not that you’re limited, but that your language will force you to think about certain things ALL THE TIME. The most famous example are those few languages that have no word for “left” or “right”, but instead use the equivalent for “North”, “South”, “East,” and “West” instead. Native speakers of these languages ALWAYS know where north is, because their language forces them to think about it all the time.

I teach ESL to speakers of a language that uses neither articles nor plurals. I have to tell them they need to start thinking about numbers more in order to speak English. In their language, they can get away with simply not considering how many of something is there unless they want to talk about the number specifically. They can be ambiguous as to whether it’s one single strawberry or a whole bowlful. But in English, you simply can’t easily say a sentence without specifying if it’s one or many. I mean, you CAN avoid saying it… again, language does not restrict thought… but it becomes awkward and super cumbersome to try, and calls extreme attention to the fact that you are avoiding giving the number, which could be suspicious. (“I ate an unspecified number of strawberries which might have been limited to only one or might have been more today”). In the language of the people I’m teaching, it’s just normal speech.

The problem was that the meaning followed precisely from that specific cultural reference. If you didn’t know the reference, there was no way to determine the intended meaning, which is why the Universal Translator and the Enterprise crew couldn’t get it. It wasn’t that they were too thick to understand it, but instead that in order to understand it you had to have intimate knowledge of the culture which they didn’t have.

Yes, but any idiot should have been able to see that the problem wasn’t with the technical side (and fans keep missing it to this day) but the contextual one. It’s important in my context here, because I was discussing how language matters to thought or communication.

It is precisely because there are so many different ways to say things in English that the language renders us capable of expressing so many different shades of meaning, or, to put it another way, to express finer and finer distinctions. Without the language to express a fine distinction, you do not have the ability to conceive mentally of that distinction, which means your capacity for original thought is reduced.

I’m not so sure about that. First, I had thought — from some linguistics courses I took a while ago — that the idea that your language is that determinate of your concepts was at least partially debunked. For example, a lot of the examples were about colours or about words for snow but then later studies showed that it wasn’t really true that they couldn’t see or understand or even note the concepts. It’s probably more about experience than it is about language (we don’t distinguish a colour, say, because we never have to in our personal experience, but the lack of the word distinction follows from that fact and not from the fact that we don’t have a word for it). Language can shape concepts and make some things harder to grasp, but it doesn’t prevent us from coming up with it if we need to. Second, for conceptualization — and to tie this back to discussions last time — philosophy has pretty much proven that we can indeed conceive of things that we don’t have a good word for. Across all languages, philosophers have come up with concepts that no only don’t have a good expression in English, but don’t have a good expression IN THEIR OWN LANGUAGE. Thus we end up with them using a word that’s “close” but often confusing and so ends up equivocating at times as the word has implications that don’t fit, turning them into “technical terms”. Further to that, philosophers end up either using metaphors or examples to explain their concept, or long chains of words to describe it so that we can understand it. If someone can come up with a fine distinction, they can use language to explain it or even just point to it in the world.

To be honest, this is how we get new words in the first place: someone comes up with something that they can’t easily express in the language, and so invents a new word to represent that thing that people understand.

(Side note: while “Darmok” is still my favorite TNG episode, for a spacefaring race like the Tamarians to speak only in literary and cultural allusions suggests that either they have a rich symbolic written language, their engineers are a separate caste with a separate language never seen by outsiders, or they’ve employed another race to do their technical work for them. I mean, when there’s a short circuit in the warp coil transfer circuit and all your chief engineer can say is “Galvani, his frog twitching,” how are you going to get anywhere?)

This is similar to my problems with the language, in line with what we’ve talked about: it’s hard to imagine how such a language could come about because unless you know the cultural reference in question you won’t understand what they’re talking about, but children won’t have those cultural references and so will have a very hard time picking up that vocabulary. It ends up being an unnecessarily complicated language without something like telepathy or genetic memory giving children a leg up in learning it.

I only mentioned it for looking at another issue, but this view is something of a pet peeve of mine. Remember that this is a(n alien) starship crew. So they work with each other a lot, probably have intense education, and are trying to sort out some weird issues in communication. They’re probably not using esoteric allusions themselves.

If the Enterprise us in a fight, and the captain suddenly shouts “Picard Manuever hard to port”, we know he’s actually giving a specific instruction that has contextual meaning. But without that context, it comes off as nonsense. The episode wasn’t ever about weird aliens with an idiotic language, but a look at how hard even basic comunication would be without magic tech.

This may be based on a false interpretation of how the Universal Translator is supposed to work, but that alien language seemed like a complete disaster to me. They seem to have to speak a complete sentence like “Temba, his arms wide!” to indicate “giving or receiving something”. A functioning language allows you to combine smaller bits and pieces to communicate a complex concept.

There’s a character in a Gene Wolfe book (predating the episode) who is from a culture where they are only allowed to speak in permitted sentences like, “As a good child is to its mother, so is the citizen to the Group of Seventeen.” But this a dictatorship attempting to brainwash everyone. I found myself wondering if the Tamarians went through something similar.

I know that some or even many people interpret that as being the entirety of their language, and the fact that down on the planet or even on the ship they never switch to the more general and not culturally infused language works against your interpretation. If they had a more normal language and what we were seeing was just shorthand, surely they would have switched to it instead of sticking with the shorthand, in much the same way as on the Enterprise a character would quickly describe the Picard Maneuver in detail if they said that to someone who clearly didn’t know what the heck that was.

And the aliens having that sort of culturally infused language was precisely the reason why the magic technology couldn’t work, so either the have a stupid language or else THEY are stupid for not speaking more clearly.

Here us a sentence which nakes in context: “The Universal Translator could never work without a LOT of human/alien micromanagement and textual care.”

Now translate it to another language. You might get “Cosmological communication will not labor absent much overseeing and literature love”. The point if the episode had precisely nothing to do with what the characters heard out of the Universal Translator. It was that the alien language didn’t have enough context to logically explain whole sentences.

I think we’re vigorously agreeing here, because I always thought that, yes, that was indeed the point and stated in the episode: the Universal Translator couldn’t work because the language relied very, very heavily on a cultural context that they simply didn’t have access to, which caused all the problems. However, that’s the point that I’m using to criticize the language itself: because it was so context-dependent it’s difficult for anyone who doesn’t have that shared cultural context to learn it … which includes their own children. For English, there is indeed a cultural context behind why we call a sandwich a sandwich, but children don’t need to know that to get what the word means. For the TNG language, either that context is required or it isn’t. If it is, then where do the children get it from when learning the words/phrases, as they don’t start with it? And if it isn’t, then all these people do is associate longer phrases with “point to” meanings, which removes how special that language was (and then is what the aliens should have done more often to try to teach it to other species, since they would have had to do that for their children, at least in some sense).

That’s why I posited that the species might be mildly telepathic, and so convey the meanings that way, at least to children. Other species couldn’t pick things up that way, and hence the frustration when they tried to teach the language to others, as they seemed dense for not getting things that just come for free amongst themselves.

Sure, speculate. But I’m trying to show that this is a plausible issue even without adding a new element. Its just that Universal Translator should *always* have this issue but that would be boring. It should be constantly leaving out vital context, skipping words or paraphrases, or shortening or lengthening even if we had most of the meaning. Its just that in this episode we see a slightly more realistic result and everyone assumes its a stupid alien problem and not a missing-knowledge problem.

Well, sure, it is something that always was handwaved over “How does the Universal Translator get the vocabulary right all the time?” when the specific vocabulary is always language specific and something that you had to learn. Note that Deep Space 9 did that with the Skreeans, but from looking it up on Memory Alpha the main issue was the syntax and grammar that made it hard to tell what each word in the sentence was grammatically. But, sure, learning the specific vocabulary that quickly is kinda magical. Still, though, the problem in the episode really was one that was carefully constructed to be about the language itself. The UT managed to figure out the grammar and syntax, but no one could get the vocabulary right because of how particular it was to that specific language. So, yes, it was a “stupid alien problem”, as my own criticisms highlighted by pointing out that such a language would be quite difficult for their own CHILDREN to learn because of that.

Darmok was indeed the best episode, but I disagree that their language would be inefficient for engineering. If we accept that as a species they have a much better time learning anecdotes as we do, it could be very handy to explain complex procedures. “GargamelLeNoir, his API exported” would refer to a very specific work I did that the listener could then emulate, instead of having the entire procedure explained as we would do.

In fact, during my Good Robot series you can see me advocating a linked list without giving any thought to how every single entry is likely to trigger a cache miss. That’s silly, and blunders like that would get me bounced out the door of a serious AAA studios engine team. [Footnote: Unless it was Bethesda Softworks, where they’d probably put me in charge of engine optimization for Gamebryo.]

Good joke, but funnily enough, this seems to actually be something Bethesda was fairly careful about. In one or two places, they even use an “array of pointers” type that has an optimization for when there’s only one pointer in the array: they set a flag bit in the array length, and then instead of actually having an array of pointers allocated somewhere (i.e. Foo**) they just have the single bare pointer (Foo*).

(Then after optimizing that stuff so much, they turn around and write code that uses GetPrivateProfileString a few hundred times on startup, lol. IIRC mod authors have cut Skyrim Special startup times significantly just by injecting code to manually read the INI files. Anyway…)

That optimized pointer array isn’t used in a lot of places, so I don’t know the exact goal. IIRC it’s used for the Active Effects lists on characters. (Or maybe it was the list of spells a character knows…) I wonder: Is that array trick the kind of thing that can reduce cache misses?

I’m not the only one adverse to thinking about the actual physical limitations of the hardware.

Small String Optimisation is basically the same idea.
Either they had a team who’s job was to make ‘useful gadgets’ that were as efficient as possible, or somebody was playing code golf.

A lot of large organisations had teams to do that until maybe ten years ago?
The C++ STL was created in 1994, and many compilers didn’t have decent implementations until quite recently.
It wasn’t even formally standardiesed until 1998.

These days such work is largely pointless because your compiler vendor already did it better than you.

Even this year the Microsoft compiler got several significant go-faster stripes in some of the STL implementation.

TBH, C++11 is almost an entirely different language to C++98.
Keywords have been removed, repurposed and even the behaviour of “create new object” [operator new()] has completely changed.
Some of the codebases I work with predate C++98. It’s actually quite astounding that they still compile, let alone the fact it still works as intended!

Yeah this. Early CPUs had no cache. If they didn’t have registers, then they just wouldn’t have worked.

As an expansion for those curious. Let’s say you want to multiply two numbers together, the way that works is the CPU first loads in variable 1 from memory into one of it’s registers. Then it goes and loads in variable 2 from memory into a second register. Once that has been done, it then multiplies them together and writes the result out to memory.

Ultimately modern CPUs still need to load data into registers before processing them, the difference being that they can:
1) Simultaneously load multiple values nowadays.
2) Have lots of little memory locations (cache) closer to the registers because it’s not 1970 anymore and memory loads take longer than 1 cycle. To get around that we have the caches, which give us performance of 3-4 cycles L1, 10-20 cycles L2, 40-70 cycles L3, and 120-400 cycles for main memory.

Java is good for some tasks, Python is good for others, but none of the challengers works as a broad general-purpose language.

What is Python’s specialty? I am attempting to learn it for basic scripting tasks such as automated file data modification/extraction, and my general understanding was that Python is a broad, general-purpose language, developed with ease of use and learning in mind.

Python is really good for tasks where you want to write small amounts of code, fast, and you don’t need to worry about performance or future maintanability.

So isolated scripts, small command-line programs, etc. Once the project grows in scale, it’s still a useable language, but it suffers from a lack of static checking and an API type system that is kinda tacked on, which means projects often rely on heavy unit testing to perform checks that would be the compiler’s job in other languages.

Python is really good for tasks where you want to write small amounts of code, fast, and you don’t need to worry about performance or future maintanability.

I do not fully agree. Python specializes in human-readable code that is easy to understand. This actually makes it much more maintainable. Also if you read “the zen of python” https://www.python.org/dev/peps/pep-0020/ you will find that a lot of things promote maintainability:
Special cases aren’t special enough to break the rules, there should be one– and preferably only one –obvious way to do it etc.

Python’s specialty is maintainability. However I grant that it is not fast. BUT some libraries can be fast because python has a C-api, and libraries can be written in C.

Because of this python is used a lot for data science. It is easy to understand, easy to write, and fast libraries such as numpy exist.

Also because of the maintainability lots of companies use it. From wikipedia:

Large organizations that use Python include Wikipedia, Google, Yahoo!, CERN, NASA, Facebook, Amazon, Instagram, Spotify and some smaller entities like ILM and ITA. The social news networking site Reddit is written entirely in Python

Well, yeah, if everyone in your team abides by coding standards and uses a linter and documents their code and write the performance-sensitive parts in C, then *any* language can be said to be highly performant and maintainable, even bloody JS.

Still, I’ve worked on high-technical-debt python projects and high-technical-debt C++ project, and Python was often way worse.

In cases where the team didn’t bother to write unit tests for their code, there was a constant possibility any changes you made might lead to crashes in production 6 months down the line, so…

(on the flip side, bad C++ has the constant threat of memory corruption, so it’s not super maintainable either)

Point is, while any language is mostly as good as any other if you have the right tools, as far as Python has drawbacks, it’s the performance and weak type system.

Still, I’ve worked on high-technical-debt python projects and high-technical-debt C++ project, and Python was often way worse.

In cases where the team didn’t bother to write unit tests for their code, there was a constant possibility any changes you made might lead to crashes in production 6 months down the line, so…

I can see where you are coming from. There is a lot of horrible python code in the world. Also the dynamic typing system can make debugging frustrating because you constantly have to worry what that frigging variable actually is (numeric, string, boolean, some arbitrary class? aargh!). So yes, I totally see why that would be frustrating.

On the other hand, python does offer a type annotation syntax which makes code more readable and maintainable and it does have lots of tools for unit testing. Is it fair to blame python for the fact that users do not use these? There are probably also C++ and C-programmers who do not properly test their code.

Well, yeah, if everyone in your team abides by coding standards and uses a linter and documents their code and write the performance-sensitive parts in C, then *any* language can be said to be highly performant and maintainable, even bloody JS.

Comparing python to JS is not fair. Python does have a C-API and JS does not. This is a feature of python which a lot of other languages do not> have. So I think it is fair to mention this as a python advantage for writing performant code.
Furthermore, like I mentioned, you do not have to write C-code to get this performance. You can use libraries such as numpy and pandas to get C-like performance without ever having to write C or C++ and have to worry about memory management. This makes python a lot faster in development time. This is a huge advantage and is why python is used a lot in data science.

I am not trying to get in a “which language is better” argument here. Every language has its uses. When Supah Ewok asked

What is Python’s specialty?

, the answer

Python is really good for tasks where you want to write small amounts of code, fast, and you don’t need to worry about performance or future maintanability.

is not complete. You are completely right in that it works well for fast development and that it is not performant in general. However, it is also maintainable and can work great even with lots and lots of code. The performance issues are addressable. I think it would be only fair if I mentioned that aspect of python when somebody asks a question about it.

For basic scripting tasks Perl is usually a better choice, especially with how awesome MetaCPAN is. Although Perl6 is a better language if you can make do with a very limited amount of third party libraries.

The main thing that makes Python unsuitable for AAA game programming is that it is interpreted (which is slower than compiled) and memory managed with garbage collection (which does not allow granular memory layout control).
Other than that, it’s a great general purpose programming language.
Oh, also it’s duck-typed with no compile-time type checking, which some people love, but can make debugging way more difficult.

I think you meant, “Always makes debugging way more difficult, because you never have any guarantees on what another function is giving you, and the other programmer always makes you depend on implementation details (recursively, and branching), because they can’t be bothered to learn what the words interface, coupling, or cohesion mean.”

Languages like JS and Python have “JIT” (Just In Time) compilers. First, the language is compiled to bytecode, which is a language-specific, cross-platform equivalent to CPU assembly. Then, a profiler detects which parts of the code run the most often, and run a more aggressive optimizer on them, which compiles them to CPU assembly (it’s a little more complicated than that, because the optimizer needs to know what type of data the function receives most often to be able to compile that function to as-good-as-C assembly).

In the above pipeline, only the parts of the code that run the least often are executed by what you’d think of as a interpreter (eg a loop that reads through instructions and goes “if instr == X do y”); performance is worse than C, but not terribly worse.

And that’s just the general case. If you’re building a game engine, there are “AOT” (Ahead Of Time) Python compilers you can use. Garbage collection is the real problem; and the fact that the type system is very loose, which means things that are essentially free in C++ require a lot of checks and generated code in Python.

There are games out there which use C++ for the engine code, and use Python (or some other interpreted language) for non-performance-critical scripting. (E.g.: the kind of code that says “when the player gets into this zone, spawn these enemies in this location, and do these other things to advance the story”). Other options to fulfil a similar role exist. Lua is one, and Unreal Engine blueprints are another, but it’s not unusual for large companies to create their own scripting language like this. Although if you make Python your scripting language, it makes things easier because you can hire people who already know it.

The advantages are: it’s more accessible than C/C++, so the level designers can play with it without having to become full-on programmers; but also being interpreted is itself an advantage. It means designers can iterate on it very quickly, without having to re-compile anything. This lets them tweak balance much faster – even on the fly. These scripts also allow the mod community something to work with.

So we need an air filled tunnel and up-stop wheels like a roller coaster. That should be simpler than building new engines, right? I mean, we have a whole fleet of steam engines and we don’t want to just throw them all out.

Nah, physics makes it simpler than that. We just need to build up to escape velocity on an appropriately-angled track, and we can shut down the engine and coast all the way to the Moon! And the vacuum means the water will boil at much lower temperatures (easily attainable from the incident solar radiation), so it’s like free power! We won’t even need any additional coal!

At this stage, I feel obligated to point out that if you enjoy playing around with ideas like this, a colleague of mine wrote a book about problems of motion. Honestly, we could probably set him to work on this and have a journal article in a year or so.

The thing is C is still pretty damn good at the core concept of “Take a bunch of bytes from memory, give them a name, and then do things with them”.

The reason we still use C is the same reason we still use gas to power cars even though we’ve had batteries and nuclear power for a while: even though it has major drawbacks, it’s really good at the one thing we need it to do, and the alternatives have benefits that are comparatively secondary.

Don’t get me wrong, I code in C++ for my day job and there isn’t a week that goes by where I don’t find a problem that makes me wish we were using Rust or D instead.

But still, there’s nothing I could code in those langiages I couldn’t code roughly as fast in C++.

“advocating a linked list without giving any thought to how every single entry is likely to trigger a cache miss”

Well, that depends … Let me put on my accountant hat.

For example, a Haswell has a 64-byte L1 data cache line size, holding 8 words per cache line. If you allocate a simple linked list, you get 4 list cells per cache line. A L1 cache hit costs 4-5 cycles. If you just run down the list, you get worst case 1 miss per 4 hits. That miss goes to L2, which responds in tens of cycles, roughly speaking (Haswell: 12 cycles). If you miss there too, you may then either go to another level of caching or to DRAM. (Haswell uses an L3 cache with latency 36 cycles while DRAM appears to have a 200-300 cycle latency.)

If you allocated an array and did the equivalent, you would get worst case 1 miss per 8 hits. Walking a list is a bit slower than incrementing a pointer, but if you hit the cache, it’s still just a handful of clock cycles per element. In practice things will be more difficult to predict of course, and you won’t miss whenever you go to a new cache line. It all depends on data layout and the hardware particulars. Haswell’s data cache is 8-way associative, meaning it can handle up to 8 different blocks that map to the same cache line without missing.

Perhaps worse is that the link fields are really unnecessary since they just point to the next address and so just pollute the cache (4 cells vs 8 array elements per line). You could also argue that the linked list pointer chasing limits instruction-level parallelism.

Finally some more hardware notes: the cost of a cache miss can be literally thousands of cycles today in some cases when you have a multiprocessor system with coherent caches that have to send around data (esp. off-chip, etc). TLB misses (basically another cache for mapping virtual memory) are also quite expensive. Finally, reading from hard disk is about 10 ms, which corresponds to about 0.5-1 million cycles or so.

I think you may have made a lot of mistakes I would have made up until recently, if I understand your post correctly.

1) I understand that you think that accessing data from the L1 cache takes 4-5 cycles, and technically it does. I used to think it was just that simple myself. However, this is actually free for the CPU, since the CPU will start fetching this data before it actually needs it. The pipelines on CPUs are deep enough that this cost is totally amortized.
* Note, for the very first instruction of the program, you would indeed need to wait the full L1 cache hit. Of course, the data wouldn’t be in cache in the first place, but that’s an aside.

2) I think you made a bit of an error WRT Linked List cache penalties. The entire purpose of a Linked List is to allocate memory from the heap, which is to say, almost randomly. If you have an int that takes 4 bytes, and then a 64 bit pointer that takes 8 bytes, sure that’s not great. However, the larger problem is that when we load in a cache line we’re not loading in the next item in the list because it’s in a random spot in memory.

You’re correct that things get very complicated with X way associate caches on modern CPUs, but there’s extreme reason to doubt that the items we need will still be somewhere in our cache unless all our program ever does is just walk up and down the same list. Even then, we’re going to get a lot more cache misses due to contention and the large size of our data (with the pointer(s)).

// Explaining what I mean., let’s pretend that this is some simplified program loop.
// Code that walks through linked list, items are getting stored in cache here one by one.
// Rest of program runs, over time completely evicts all the stored items despite multi-way associativity.
// Code that walks through linked list runs again, 99%+ of the data has been evicted from cache by the rest of the program.

Sure, it’s more complex than what I described. Many details to consider when optimizing. I hinted a bit at the first problem you raised by mentioning instruction-level parallelism.

The entire purpose of a Linked List is to allocate memory from the heap, which is to say, almost randomly. If you have an int that takes 4 bytes, and then a 64 bit pointer that takes 8 bytes, sure that’s not great. However, the larger problem is that when we load in a cache line we’re not loading in the next item in the list because it’s in a random spot in memory.

Depends on your memory allocator. If you use, say, a nice and fast bump pointer allocator, the list cells will in the normal case line up like I described. Using malloc() or something has a certain overhead of its own. Truly getting a random 48b/64b-address for each new cell would be hell on the memory system, especially the virtual memory.

Regarding cache hit rates: ultimately you have to measure. In practice, it might not be as bad as you think. In my mind, ‘junk’ next-pointer fields might be the real reason to avoid linked lists (when you don’t really need them).

If you allocate the entire list entirely at once, then sure, your cache hits will happen more often than for an array, but that still wouldn’t be all that bad. Of course, if you’re allocating the entire list all at once, then some might question the utility of a linked list in the first place. If not, once we start inserting random other data in that sequence, then we’re going to get correspondingly worse performance. For a reasonably realistic use case for a linked list, there’s just no way to get the same performance from a linked list versus an array.

Not that I don’t think you don’t already know this, and yes, benchmarking is necessary. I do believe that Shamus’s Good Robot implementation was just calling new naively. Personally, now I’m curious myself to see how the OS actually gives us memory, and how sequential that is on average.

That’s true if you allocate the whole linked list at once, but that’s rarely what a linked list is used for and it’s not how Shamus was using it in his linked post. After all, if you could allocate the list all at once, why not just use an array?

Linked list allocations and deletions are usually spaced far apart in time. If you use a linked list, and you allocate a cache-sized chunk of unrelated memory in between list allocations, you will have a cache-sized gap between each element. That means that walking the list causes a cache miss at each element, which is the problem Shamus was alluding to in this post.

You seem to be talking about a more difficult scenario than just replacing the list with an array in that case.

Note that it’s not always a problem even when addresses happen to map to the same cache line. Our example processor Haswell uses an 8-way associative L1 data cache to reduce or remove such collision issues. You have to work to trigger such problems.

(It’s my impression that you actually can run into these particular issues more easily with arrays, at least with scientific algorithms. If the array sizes or loop counts are unfortunate, for example. I’ll leave that discussion for another time though. It might be less relevant for games.)

I saw this comment after I posted my reply above. Anyway Sardonic and I agree that for some artificial example, a linked list could be only moderately worse in terms of performance. However, I’m not sure why anyone should care about such theoretical examples.

Also, multi-way associativity does not solve the problem of things getting evicted from cache in the long term. The point is to get around a really bad situation where two things happen to map to the same spot, and the CPU keeps evicting each other again and again to do instruction that requires both. It does not solve the more general problem of:
1) Your program did a whole bunch of other things for a long time. Now none of that list remains in the cache because it’s all been evicted over time.

With that in mind, yes, you solve the short term problem of serious conflicts with the cache, but so what? That would help you if you kept going back and forth over a linked list, but not when you traverse down it for the first time, because the data is just not in the cache in the first place.

The above assumes that the memory has been semi-randomly allocated, but if it’s been sequentially allocated, then what’s the point of a linked list? On top of that, an initially sequentially allocated list is going to over time become very “spotty”, as things get deleted or added to the list at different times in the program execution. Again, if we are going back and forth over the linked list and there are clusters of data, it’s not that bad, but it’s totally a miss going forward that first time.

1) Your program did a whole bunch of other things for a long time. Now none of that list remains in the cache because it’s all been evicted over time.

It’s possible, but the same goes for an array. In general, you need to have locality of reference to exploit caches.

The above assumes that the memory has been semi-randomly allocated, but if it’s been sequentially allocated, then what’s the point of a linked list?

It could be allocated that way due to how the program is written. Perhaps you have some library calls or constructors or something. But read on.

Recall that this discussion began with a (quite possibly exaggerated) claim that every time you follow a list pointer you get a cache miss. I then showed with a simple example of a single cache line that this is not necessarily true. In a real environment with a cache hierarchy of more cache lines and associativity, you will also be likely to see that pointer references do not necessarily miss in ‘the’ cache when running a more realistic program. Indeed, I personally would guess the hit rate usually is pretty good unless you run into a pathological case (like arrays or pointer data that excessively evict each other due to cache capacity constraints or cache conflicts).

However, I think such an investigation is somewhat beyond the scope of this comment section. (I’ll also reiterate that pointer data may still waste cache space, which you might prefer to avoid.)

“It’s possible, but the same goes for an array. In general, you need to have locality of reference to exploit caches.”

It’s not the same. If the array has been completely evicted from cache then you get a cache miss on the first element, but the CPU loads in 64 bytes of data. Additionally, this is the entire purpose of data prefetching from the CPU. The CPU will be loading in the rest of the array into its L2 cache, so you will only ever get the penalty of an L2 cache hit for the rest of the array. If you got a complete cache miss every 64 bytes of an array, your computer would run a whole lot slower.

In contrast, the first element of the linked list misses the cache just like the array. However, we now have zero idea what’s going to happen for any later element, because the rest of the list could be sequentially allocated, completely randomly allocated, or somewhere in between. If it’s randomly allocated, no amount of cache associativity is going to help us, it’s not in the cache, and it won’t be prefetched, since the CPU is not magical, and would need to resolve the address before fetching.

Even with a modern CPU with a monstrous amount of L3 cache, if your program runs long enough, the data will get completely evicted. Even if we have only a partial eviction, cache misses are really painful, and L3 cache latencies are far from L1. I would personally guess that the L1 hit rate would be awful for any serious program, that didn’t just go back and forth between two elements of the same list.

Cool quote. I hadn’t heard that before. Still, it doesn’t exactly invalidate Shamus. A programmer is a device for turning coffee into code. Coffee, in this case, being the raw material for both. Like Petroleum is a raw material for both asphalt and fuel for the truck that drives on it. So does a computer program ride on the solid foundation of mathematical theorems.

I’ll be honest: I was initially worried a mathematician had stolen the phrase from computer scientists. After that, well . . . if you get a chance to mention Erdos, you do it. Even if people misattributed something to him.

The old PDP-11 model of C might not be appropriate anymore, but I think the ‘zero overhead’ model of C is still difficult to beat. (C++ adds some complications to that statement of course.) If you’re seeking ultimate performance, that’s attractive.

So let me take a guess: a C-like language that allows you to reason effectively about a collection of memory spaces, how to map data onto those spaces, how to optimize copying of data between them (lazily or not). Perhaps a little bit about how you program them too.

What excites me the most about Jai isn’t something that most people talk about. It’s just the fact that it compiles at, trusting demos by Jon Blow, >100,000 lines of code per second. That’s even better than you think, because there are no header files in Jai, which means that the equivalent C++ program would be even longer. Jon Blow has even stated that he believes the compile times can be made to go much lower, possibly even 10x faster. I think that’s plausible, if unlikely, but I’m just happy that there’s someone who cares about actual real-world productivity destroyers and views something so crucial as actually being crucial.

Well if you watch one of his twitch streams he compiles all the time in the natural course of just doing work. You can find highlights on his Youtube channel. I mean, he could be lying about the size of the codebase, although if true he’s certainly gone to a really long ways to do so, but you can see his game compile in < 1 second, tons of times, and his game has around 100,000 lines of code already.

Secondly, I have no idea what this "crowdfunding" argument comes from. Jai has never been open to crowdfunding. Jon Blow is not being paid to develop the language, he's doing it for free. I think whomever said that to you, frankly, didn't know what they were talking about.

That said, I’d still be highly skeptical of any claim about performance where the compiler isn’t publicly available for benchmarking. (even in cases where the developer is being perfectly transparent and honest and showing stuff on youtube)

First, there’s a huge difference between compiling and re-compiling a project. Even in a language like C++, which is terrible at incremental compilation (change a single comment in a frequently-used header and watch half your project be recompiled for no reason), compiling your project after making minor changes is (usually) going to be pretty fast, because the build system knows that most of your code hasn’t changed.

Second, compile-time speed will be extremely dependent on your machine’s specs, especially RAM size and whether you have a HDD or a SSD drive.

Third, all code isn’t made equal. There are some language features (notably templates and compile-time expression evaluation) that are extremely useful, and also extremely expensive for compilers; (mostly because most compilers kind of add them as an afterthought, not because they’re inherently complex). A true benchmark needs to include lots of different, realistic use cases and see how different languages compare.

If JB isn’t lying outright (which seems to be your main suspicion at this point, which is a fair point) then Jai doesn’t do incremental compilation. It’s a full compile from a cold start every time. The speed-up in subsequent compiles that he shows in the streams is from the source code still being in RAM, so the OS doesn’t have to pull the data from the SSD. Or, that’s my understanding anyway.
But in basically every other language, the disc delay is a negligible factor in compile time, so the fact that it’s becoming noticeable is a really good sign!

I’m not accusing the developer of lying (like Timothy said, there wouldn’t really be a point). Just saying it’s really hard to get an accurate idea of how efficient compilation is from a youtube video, even if the person is being honest.

He talks about it whenever he’s demoing the compiler speed. I’m not certain that it’s impossible to do incremental in Jai (no doubt one could set it up that way if one wanted) but Jon hasn’t demonstrated it. All of the compile times are full compiles from source code.

“Debug” mode means that the compiler doesn’t do much – if any – optimisation at all.
Heck, some compilers don’t even inline functions marked ‘inline’ unless forced, and many don’t do the >C++11 constexpr evaluation.

In general, the two phases that take the most time in ‘large’ projects are template evaluation (because template metaprogramming is an entire Turing-complete language of its own) and doing link-time-optimisations.
Preprocessor abuse can also blow up your compile times, however as using it for anything beyond “exclude this block of code entirely” is very much Poor Form these days it doesn’t really happen anymore.

If a language doesn’t have either of those then it should be much faster to compile than C++, but it is likely to produce larger, slower binaries.

Which might not matter at all, depending on where it’s larger and slower.

That said: I find “no header files” a massive downside.
Header files are a ‘table of contents’, and they speed up compilation and more importantly comprehension.

They tell me what the associated code is going to do, and hide away all the implementation details.
The header for a very large object often fits on one screen, so I can see the whole functionality at a glance.

Yes, originally headers were because compilers couldn’t fit an entire program in memory, but these days they’re the Executive Summary for an API.

“Header files are a ‘table of contents’, and they speed up compilation and more importantly comprehension.”

It’s debatable whether header files increase comprehension. Personally, I find they’re a real pain when it comes to refactoring, which destroys my productivity more than I would get from any increased comprehension.

However, they absolutely do not speed up compilation. In fact, they are the main reason why jai compiles so much faster than C/C++. Header files need to be re-loaded and re-compiled for every source file that includes them, because the pre-processor might make some changes to them, so we aren’t guaranteed that they will be the same. This “let’s actually not compile some things hundreds or thousands of times more than we need to” IS the speed increase of Jai.

If there’s no benefit to the preprocessor changing the logic in the header, why do it? How is it slower to reload and recompile the header than to load and compile the code contained within the header for everything?

Is this just a case of “Normal practice is to include a lot of unnecessary things in the header that the compiler has to deal with”?

“How is it slower to reload and recompile the header than to load and compile the code contained within the header for everything?”

Well that wouldn’t be faster. The point is, that doesn’t happen in Jai. You only compile things once. The way the compiler knows if FunctionNameRando() is valid is because it goes through all the code first and generates a list of valid functions. So you can write functions in any order you want, wherever you want, because we already know if it’s valid or not.

In contrast, C and C++ only know if a function is valid if it’s in the header files included for the .cpp file being compiled. The only way it knows how to do this is by re-loading and re-compiling all the header files included in the .cpp file. And again, we can’t just do this once for each file because of the pre-processor.

That’s why C/C++ take so goddamn long to compile. They just flat out do way more work than they should.

“If there’s no benefit to the preprocessor changing the logic in the header, why do it?”

Because you want to do things like #ifdef DEBUG blah blah #endif? I mean, macros exist for a reason. The point is that the compiler itself cannot make assumptions about your header files compiling the same way every time. Because, while this is a stupid example I made in Visual Studio:
File names are: Source.cpp, MuhHeader.h, MuhOtherHeader.h, MuhSource.cpp.

Alright, what is the output of our program? It's:
2.1
9.86
Because defining YOLO in our Source.cpp file changes the value that GRAVITY gets compiled to. The point is not "this is an awesome program that's very well written". The point is, the compiler has no way of just assuming that header files will always be compiled the same way, so it just needs to recompile them for every compilation unit. This is how C/C++ compilers actually work, and if you have header files included in x number of places, it gets compiled x number of times. I believe they did this in order to save on memory, since C came out in 1972, they had no virtual memory, and they barely had any hardware memory anyway, so it made a lot of sense to compile this way. It makes absolutely no sense nowadays.

EDIT: Oh that’s funny. I guess this website won’t display “”
If you still see a blanks between the quotes, I was writing “//iostream//” where I’m using the greater/less than signs instead of double brackets.

The only part of refactoring that’s even vaguely affected by headers is the ‘search-and-replace’ part.
If your IDE doesn’t do that automatically, then get a better IDE.

Headers speed up compilation by defining an interface to precompiled sections – usually called “libraries”.

Any language claiming to have libraries must have an equivalent, or those libraries must be recompiled every time.

I don’t want to recompile the whole game engine every time I create a new monster.

Yes, if you misuse the preprocessor or include every header into everything then you throw away that advantage.
Much like it’ll take you a long time to drive anywhere if you take the engine apart every time you park the car.

“The only part of refactoring that’s even vaguely affected by headers is the ‘search-and-replace’ part.
If your IDE doesn’t do that automatically, then get a better IDE.”

I have never had Visual Studio do this for me. Perhaps there’s some feature that I accidentally disabled.

“Headers speed up compilation by defining an interface to precompiled sections – usually called “libraries”.

Any language claiming to have libraries must have an equivalent, or those libraries must be recompiled every time.”

You’re confusing two different things here. Yes, all languages need a way of creating pre-compiled libraries and then interfacing with them. Not only would re-compiling someone else’s library be wasteful, but they might not want to give you the source code in the first place. However, we also need a way of, for programmer sanity, writing things in different files and having our program on its own compile and execute correctly. The former is trivial, the latter is what takes C/C++ forever to compile.

In Jai, you use an #import command in the main.jai file to import whatever libraries you want. When it comes to your own code, if you create a function or variable that is global, then it just exists and can be referenced in all other files. There is a #scope_file (I think that’s what it’s called) keyword if you want the compiler to not let other files see what you have made, but by default they can.

This means when compiling, Jai can sweep through all the files in your program, generate identifiers for everything you have created. Then it can go in whatever order it feels like compiling the rest of your program. In fact, this step can even be multithreaded. In contrast, as I have detailed above, C/C++ need to compile all the header files included in each .cpp file, every time. So, no, header files do not speed up compilation time.

“I don’t want to recompile the whole game engine every time I create a new monster.”

The game engine should be a separate library, and your gameplay code should be separate, just like it is with Unity and Unreal. You will have to recompile your gameplay code when you change a monster, but that will still be many multiple times faster than with C++, because of the compilation model.

“Yes, if you misuse the preprocessor or include every header into everything then you throw away that advantage.”

Bizarre and irrelevant strawman. Misuse of the preprocessor has nothing to do with the nature of C compilation. Over inclusion of header files also merely exacerbates the already existing problem of C header compilation. There’s no magical way of never including header files other than writing all your code in a single file.

“Much like it’ll take you a long time to drive anywhere if you take the engine apart every time you park the car.

But you could, you know, not do that?”

The analogy is so bizarre that I’m almost at a failure here. You can’t not do that, that’s how C compilation works. There’s not really a car analogy that works here, building a car and generating code are just different.

The problem with optimizing close to the metal is that it’s hard to understand things that close to the metal.

I’ve done a bit of work in 6052 assembly (because that’s what the Nintendo Entertainment System is), and that just addresses memory directly by address. With a conceptually simple change, you can implement a level of multitasking on that hardware, by using offsets. But you can’t get to virtualization of memory from there without implementing an entire abstraction layer to memory access, and now all the programs that used to ask for specific memory addresses must instead ask for things in the form that the memory abstraction layer wants them.

(Oh, and these ‘programs’ are still in assembly instructions; the hypothetical 6052 ‘memory manager’ would have to be hardware.)

Hey, Shamus, so I didn’t know where to ask this, so I’m doing it here. The blog seems to be unavailable in Russia for the last few days, but works fine via VPN. Any idea why? I kinda doubt that Russian authorities would be blocking your site, so I don’t really have a theory.

I’ve noticed this as well. If you check open databases about blocked pages, you’ll see that RKN blocked some page, that has the same IP-address as this blog (and a couple more pages). So, Shamus is just a collateral in this case