Blazing Browsers —

Mozilla beefing up JavaScript performance with new JIT compiler

IonMonkey JavaScript will debut with Firefox 18 in early 2013.

In an effort to keep Firefox competitive with commercial browsers and handle the ever-heavier burden that interactive webpages put on browsers’ scripting engines, Mozilla is working on a new JavaScript just-in-time (JIT) compiler architecture for Firefox’s SpiderMonkey script engine that will significantly boost the browser’s performance. Called IonMonkey, the JIT is now part of the nightly Firefox test builds, and is set for wide release as part of Firefox 18.

In a post on Mozilla’s JavaScript blog, Mozilla developer David Anderson said that the architecture of IonMonkey uses a three-step compilation process for JavaScript that mirrors how production compilers for languages such as C++ and Java work, performing analysis and optimization of the intermediate representation (IR) of the script code before turning it into the machine code run by SpiderMonkey. In the current JägerMonkey JIT, there is no optimization step. IonMonkey is targeted at long-running JavaScript applications; for shorter ones, Firefox will continue to use the current JägerMonkey JIT.

The initial performance results for IonMonkey are substantially better than previous versions of the JIT. On Google’s V8 benchmark, the test version of Firefox 18 scored a 7 percent improvement in performance over Firefox 17; with Mozilla’s own Kraken benchmark, IonMonkey resulted in about a 26 percent performance boost over Firefox 17. “We’re excited about this not just for performance and maintainability,” Anderson wrote. “but also for making future JavaScript compiler research much easier. It’s now possible to write an optimization algorithm, plug it into the pipeline, and see what it does.”

Firefox 18 is expected to enter beta in November. The release version is slotted for arrival early in 2013.

I'm getting confused by all the monkeys. SpiderMonkey is the "script engine". JagerMonkey is the current JIT for SpiderMonkey, and IonMonkey is an improvement over JagerMonkey. ???

The naming needs to be consistent, so that that which is specifically JIT-related gets the -monkey appellation, whereas things not specifically JIT-related get some other appellation. IOW, The Javascript engine should be called e.g. Jungle which is full of, in turn, monkeys.

Is there every going to be process-per-tab support? It seems like it was put on hold last year and I haven't heard anything since ...

They put that on hold to go after the lower-hanging fruit; memory usage and leaks, speeding up the gui, and minimizing the impact of poorly-coded addons.

They'll get back to it eventually, but for the moment they're trying to maximize performance by fixing a number of smaller issues. Which, is probably a good idea considering per-tab would just amplify any of the smaller problems.

Is there ever going to be process-per-tab support? It seems like it was put on hold last year and I haven't heard anything since ...

Never, but they are making async every possible part of the browser, in the near future the rendering will be on a different thread (already completed on some platforms) and then separate the content and the browser GUI. It's lighter than having a different process per tab.And they're working already on many little things.

Is there ever going to be process-per-tab support? It seems like it was put on hold last year and I haven't heard anything since ...

Never, but they are making async every possible part of the browser, in the near future the rendering will be on a different thread (already completed on some platforms) and then separate the content and the browser GUI. It's lighter than having a different process per tab.And they're working already on many little things.

Right, but the primary reason to do process-per-tab is security, with performance being a very pleasant side effect. Chrome is already on the verge of being uneconomical to exploit. Firefox could follow its lead, but has barely tried so far.

Is there ever going to be process-per-tab support? It seems like it was put on hold last year and I haven't heard anything since ...

Never, but they are making async every possible part of the browser, in the near future the rendering will be on a different thread (already completed on some platforms) and then separate the content and the browser GUI. It's lighter than having a different process per tab.And they're working already on many little things.

I my uniformed opinion per process tabs is what makes chrome so good. I have firefox also, but firefox doesn't handle more then one tab nearly as well. If I have Pandora in one tab, the other tabs seem to jitter in there content handling. The page still scrolls smoothly, but it is like the function on the page literally stops loading / working for a fraction of a second. It gets really bad if there is some bad js on a page, and pandora will then just randomly sound cut out.

Is there ever going to be process-per-tab support? It seems like it was put on hold last year and I haven't heard anything since ...

Never, but they are making async every possible part of the browser, in the near future the rendering will be on a different thread (already completed on some platforms) and then separate the content and the browser GUI. It's lighter than having a different process per tab.And they're working already on many little things.

I my uniformed opinion per process tabs is what makes chrome so good. I have firefox also, but firefox doesn't handle more then one tab nearly as well. If I have Pandora in one tab, the other tabs seem to jitter in there content handling. The page still scrolls smoothly, but it is like the function on the page literally stops loading / working for a fraction of a second. It gets really bad if there is some bad js on a page, and pandora will then just randomly sound cut out.

Chrome is just faster to.

-- FBlue

I love Chrome as well, and it's all I use, but it uses an ungodly amount of memory. I'm hoping they steal Firefox's lazy loading of tabs on startup, which would help immensely.

Even with all the tabs loaded, Chrome still uses massively more memory than Firefox does.

Something is borked with Firefox's updating though. A clean reinstall with the same extensions is super snappy even when loading dozens of tabs whereas I get stalls and stuff with the updated one (I did the reinstall after getting annoyed).

It won't be long at all before JavaScript is faster than C++. We've been studying static optimizing compilers for 40 years, and we've pretty much hit a wall. There's only so much we can glean from static code analysis. But we've barely scratched the surface of dynamic code optimization at runtime. There's so much more information available at runtime.

A static compiler has to make educated guesses about unrolling loops and inlining function calls. A dynamic compiler can unroll as we loop and inline as we call. Just these two optimizations alone have improved the performance of virtual machine runtimes by factors of 20x or more, and these are just dynamic equivalents of venerable static optimizations. There are undoubtedly many more types of optimizations that can only be performed at runtime.

As a web developer I'm really glad Firefox exists, for one reason. Firebug. The built-in developer tools in Chrom(e/ium) are ok, but no match for Firebug. Those in IE (version 9 at least, haven't tried 10 yet) are a joke. Seriously, if you do web development even as a semi-professional hobby and you haven't tried Firebug yet, you're missing a lot IMHO.

It won't be long at all before JavaScript is faster than C++. We've been studying static optimizing compilers for 40 years, and we've pretty much hit a wall. There's only so much we can glean from static code analysis. But we've barely scratched the surface of dynamic code optimization at runtime. There's so much more information available at runtime.

A static compiler has to make educated guesses about unrolling loops and inlining function calls. A dynamic compiler can unroll as we loop and inline as we call. Just these two optimizations alone have improved the performance of virtual machine runtimes by factors of 20x or more, and these are just dynamic equivalents of venerable static optimizations. There are undoubtedly many more types of optimizations that can only be performed at runtime.

PGO isn't exactly for this? See the code running to make other optimizations?Also if dynamic optimizations are so good they can always mix in the C++ a little runtime, or not?

Is there ever going to be process-per-tab support? It seems like it was put on hold last year and I haven't heard anything since ...

Never, but they are making async every possible part of the browser, in the near future the rendering will be on a different thread (already completed on some platforms) and then separate the content and the browser GUI. It's lighter than having a different process per tab.And they're working already on many little things.

Right, but the primary reason to do process-per-tab is security, with performance being a very pleasant side effect. Chrome is already on the verge of being uneconomical to exploit. Firefox could follow its lead, but has barely tried so far.

It's not clear Chrome is "on the verge of being uneconomical to exploit". While exploiting Chrome is hard, it has not prevented hacks so far. It's hard to estimate how hard it is since most hacks are secret. But we do know that by most calculations Chrome is the #1 most popular browser - so economics-wise, even if it's hard to exploit, there is more incentive.

Note also that performance does not always result from process per tab. What you do get is that a slow tab does not stop the rest of the browser. That can lead to performance if multiple tabs would compete for CPU in a single process, but can now use multiple separate cores. It also gives you good responsiveness. But it does cost you in terms of higher memory usage, and it does decrease performance in some cases - for example just to register a mouse click and render a result, you need to capture the event in one process, send it to another, render, and send the output back, so the cross-process communication is additional work that adds latency.

Gecko (the rendering engine in Firefox) supports multiprocess, and mobile Firefox did use multiple processes in the past. This gave responsiveness and stability, but the memory and latency costs were enough to switch to a different model (current mobile Firefox uses multiple threads in a single process).

I see how two development versions could make sense: one that you're polishing up for stability, another that you get to play with more freely. But three?

Nightly - developmentAurora - polishing and bugfixingBeta - wider testing and for addon-devs to tinker with a stable codebase and without wide distribution of the programRelease - showtime.

And, I guess the reason for Fx17 v Fx18 is that the devs waited for the merge window to happen and then merged the new JIT-compiler into Nightly.

Google deserves credit for pioneering this approach. Basically, you have a nightly (or canary in chrome terms) period where changes just land. Things can break at any time. Then you migrate it to aurora (dev in chrome terms) where it no longer gets constant updates and you stabilize it. But it is not ready for a beta audience yet, especially at the beginning of aurora/dev there can be plenty issues. Then it goes to beta, where you hope to not change anything at all but just verify, through a large userbase, that things are ok. Then you release it. So removing any of these stages is likely a bad idea.

This is good news for me as my clients all use Firefox because IE9 doesn't support web workers or web sockets. They all use IE as a corporate choice though some are still stuck on IE6! Firefox is much slower than Chrome at panning/zooming our custom SVG map but it's more stable than Chrome. Google seem to break their SVG support every now and then so I can't recommend my clients use it as it would make me look bad. *sigh*

It won't be long at all before JavaScript is faster than C++. We've been studying static optimizing compilers for 40 years, and we've pretty much hit a wall. There's only so much we can glean from static code analysis. But we've barely scratched the surface of dynamic code optimization at runtime. There's so much more information available at runtime.

A static compiler has to make educated guesses about unrolling loops and inlining function calls. A dynamic compiler can unroll as we loop and inline as we call. Just these two optimizations alone have improved the performance of virtual machine runtimes by factors of 20x or more, and these are just dynamic equivalents of venerable static optimizations. There are undoubtedly many more types of optimizations that can only be performed at runtime.

PGO isn't exactly for this? See the code running to make other optimizations?Also if dynamic optimizations are so good they can always mix in the C++ a little runtime, or not?

Your exactly correct. A dynamically typed language is never going to be faster than a statically typed language, all else being equal, because more information (i.e. types) can't possible hurt. If it did, you could always just throw out the typing information.

The grandparent poster is apparently conflating compiled vs. intepreted/jit compiled with typing systems. Unfortunately, this type of mistake is somewhat common among fans of javascript, ruby and the like who don't know much about languages except that the luv their blub.

It won't be long at all before JavaScript is faster than C++. We've been studying static optimizing compilers for 40 years, and we've pretty much hit a wall. There's only so much we can glean from static code analysis. But we've barely scratched the surface of dynamic code optimization at runtime. There's so much more information available at runtime.

A static compiler has to make educated guesses about unrolling loops and inlining function calls. A dynamic compiler can unroll as we loop and inline as we call. Just these two optimizations alone have improved the performance of virtual machine runtimes by factors of 20x or more, and these are just dynamic equivalents of venerable static optimizations. There are undoubtedly many more types of optimizations that can only be performed at runtime.

PGO isn't exactly for this? See the code running to make other optimizations?Also if dynamic optimizations are so good they can always mix in the C++ a little runtime, or not?

Your exactly correct. A dynamically typed language is never going to be faster than a statically typed language, all else being equal, because more information (i.e. types) can't possible hurt. If it did, you could always just throw out the typing information.

More information indeed cannot hurt. But dynamically typed languages do have sources of information statically typed ones do not. Yes, statically typed ones could in theory completely change how they work, and ship runtimes with JITs and so forth, but then they would suffer the reasons that make languages like Java and C# slower than C and C++.

The favorite example of this is that a dynamic language can inline at runtime a function call to an external library, and that inlining can lead to a lot of optimizations on top of it. A static language can't do that, again, not unless it becomes much more like Java and C# with all the downsides (and upsides) that entails.

It won't be long at all before JavaScript is faster than C++. We've been studying static optimizing compilers for 40 years, and we've pretty much hit a wall. There's only so much we can glean from static code analysis. But we've barely scratched the surface of dynamic code optimization at runtime. There's so much more information available at runtime.

A static compiler has to make educated guesses about unrolling loops and inlining function calls. A dynamic compiler can unroll as we loop and inline as we call. Just these two optimizations alone have improved the performance of virtual machine runtimes by factors of 20x or more, and these are just dynamic equivalents of venerable static optimizations. There are undoubtedly many more types of optimizations that can only be performed at runtime.

PGO isn't exactly for this? See the code running to make other optimizations?Also if dynamic optimizations are so good they can always mix in the C++ a little runtime, or not?

Your exactly correct. A dynamically typed language is never going to be faster than a statically typed language, all else being equal, because more information (i.e. types) can't possible hurt. If it did, you could always just throw out the typing information.

More information indeed cannot hurt. But dynamically typed languages do have sources of information statically typed ones do not. Yes, statically typed ones could in theory completely change how they work, and ship runtimes with JITs and so forth, but then they would suffer the reasons that make languages like Java and C# slower than C and C++.

The favorite example of this is that a dynamic language can inline at runtime a function call to an external library, and that inlining can lead to a lot of optimizations on top of it. A static language can't do that, again, not unless it becomes much more like Java and C# with all the downsides (and upsides) that entails.

Why not? I mean why the compiler can't inline the function from the library? Assuming it can access the source.

It won't be long at all before JavaScript is faster than C++. We've been studying static optimizing compilers for 40 years, and we've pretty much hit a wall. There's only so much we can glean from static code analysis. But we've barely scratched the surface of dynamic code optimization at runtime. There's so much more information available at runtime.

A static compiler has to make educated guesses about unrolling loops and inlining function calls. A dynamic compiler can unroll as we loop and inline as we call. Just these two optimizations alone have improved the performance of virtual machine runtimes by factors of 20x or more, and these are just dynamic equivalents of venerable static optimizations. There are undoubtedly many more types of optimizations that can only be performed at runtime.

PGO isn't exactly for this? See the code running to make other optimizations?Also if dynamic optimizations are so good they can always mix in the C++ a little runtime, or not?

Your exactly correct. A dynamically typed language is never going to be faster than a statically typed language, all else being equal, because more information (i.e. types) can't possible hurt. If it did, you could always just throw out the typing information.

The grandparent poster is apparently conflating compiled vs. intepreted/jit compiled with typing systems. Unfortunately, this type of mistake is somewhat common among fans of javascript, ruby and the like who don't know much about languages except that the luv their blub.

No, you and he are *technically* correct, but wrong in practice. Definitely not "exactly correct". Yes, in principle, a C++ implementation etc could use a virtual machine with some kind of dynamic compilation/execution, so that it could do all the kinds of optimizations Java/JavaScript etc can do, but in practice that is just not the case, and therefore many dynamic optimizations are off the table for C++ etc. In addition, the non-pointer safety of C++ (i.e. inability to unambiguously know where all pointers are, and how they are encoded), prevents some classes of optimization, regardless.

In the real world, C++ etc are pre-compiled languages that do *not* include a runtime that is anywhere near sophisticated enough to do the many kinds of dynamic-optimizations that are now done routinely for languages like JavaScript and Java. For example, inlining of the majority of virtual calls, which are statically polymorphic but dynamically monomorphic. That is the single biggest optimization done by those kinds of VMs (look up "type feedback"), and it is not possible using the kind of runtimes architecture that C++ invariably uses.

There is no such thing as dynamically monomorphic, if the interpreter sees the same types a few times it can guess that the type won't change and compile a fast path that expects that type, but it still needs type checking guards to insure that its assumption remains correct.

It won't be long at all before JavaScript is faster than C++. We've been studying static optimizing compilers for 40 years, and we've pretty much hit a wall. There's only so much we can glean from static code analysis. But we've barely scratched the surface of dynamic code optimization at runtime. There's so much more information available at runtime.

A static compiler has to make educated guesses about unrolling loops and inlining function calls. A dynamic compiler can unroll as we loop and inline as we call. Just these two optimizations alone have improved the performance of virtual machine runtimes by factors of 20x or more, and these are just dynamic equivalents of venerable static optimizations. There are undoubtedly many more types of optimizations that can only be performed at runtime.

PGO isn't exactly for this? See the code running to make other optimizations?Also if dynamic optimizations are so good they can always mix in the C++ a little runtime, or not?

Your exactly correct. A dynamically typed language is never going to be faster than a statically typed language, all else being equal, because more information (i.e. types) can't possible hurt. If it did, you could always just throw out the typing information.

More information indeed cannot hurt. But dynamically typed languages do have sources of information statically typed ones do not. Yes, statically typed ones could in theory completely change how they work, and ship runtimes with JITs and so forth, but then they would suffer the reasons that make languages like Java and C# slower than C and C++.

The favorite example of this is that a dynamic language can inline at runtime a function call to an external library, and that inlining can lead to a lot of optimizations on top of it. A static language can't do that, again, not unless it becomes much more like Java and C# with all the downsides (and upsides) that entails.

Why not? I mean why the compiler can't inline the function from the library? Assuming it can access the source.

Sorry, I should have been more clear. By 'external library' I meant something not available during compilation (so LTO etc. won't help). Imagine a library loaded at runtime dynamically, like a system library that might be different on different OS versions (but your program works with all of them through a stable interface).

There is no such thing as dynamically monomorphic, if the interpreter sees the same types a few times it can guess that the type won't change and compile a fast path that expects that type, but it still needs type checking guards to insure that its assumption remains correct.

No, you are wrong. There is definitely such a thing as dynamically monomorphic, and not only is it very common, it is *the* most common case in unified object languages, where everything is an object. It's a virtual call that *could* call more than one method (from that call site), but hasn't, up to the current moment. It is very, very, very common.

That's much different than a statically monomorphic call, which is one where the compiler can *prove* there cannot be any other target methods. This is where there might be (you can't prove their can't be), but haven't been (yet), any other targets.

You can do incredible optimizations in that case, but the best optimizations require subsequent steps being able to assume the result is the common case result, and you can't do that if you just compile a type test and a fork around an optimized common case and unoptimized uncommon case, because then that assumption can be incorrect. You have to be able to do vastly more difficult things like eliminating the uncommon case altogether, and reconstructing it as-needed by de-optimizing the current activation (which might actually be many, many source level activations inlined together), reoptimizing for the new situation, and restarting from the exact next instruction.

And in fact, what you say about type checking guards still being required becomes wrong when you have the above kind of technology, because the exact kind of optimization described above can eliminate a huge number, if not most, of the checks, by hoisting. But only if you don't allow the uncommon path.

For example, inlining of the majority of virtual calls, which are statically polymorphic but dynamically monomorphic.

Actually, MSVC with PGO will do this. If the profile shows that the virtual call is dynamically monomorphic, it will emit a guard on the vtable pointer and then inline the virtual call.

But yes, most C++ compilers don't do this, and most people using MSVC don't use PGO with it.

One other note: C++ compilers can statically prove all sorts of stuff that a JS JIT is hard-pressed to prove statically. The JIT can assume it to be true and insert guards, but checking the guards takes time. Also, too many guards leads to either code bloat and memory pressure (if stack stores are sunk into the side exits) or slower execution (if the stack is synced before a guard). There are some tricks to minimize those problems too, of course, but the upshot is that right now the really good JS JITs are still somewhat slower than a C compiler.

What will be interesting to watch with JS is what happens as browsers move to off-main-thread compilation more and start adding more time-consuming optimization passes as a result.

For example, inlining of the majority of virtual calls, which are statically polymorphic but dynamically monomorphic.

Actually, MSVC with PGO will do this. If the profile shows that the virtual call is dynamically monomorphic, it will emit a guard on the vtable pointer and then inline the virtual call.

But yes, most C++ compilers don't do this, and most people using MSVC don't use PGO with it.

One other note: C++ compilers can statically prove all sorts of stuff that a JS JIT is hard-pressed to prove statically. The JIT can assume it to be true and insert guards, but checking the guards takes time. Also, too many guards leads to either code bloat and memory pressure (if stack stores are sunk into the side exits) or slower execution (if the stack is synced before a guard). There are some tricks to minimize those problems too, of course, but the upshot is that right now the really good JS JITs are still somewhat slower than a C compiler.

What will be interesting to watch with JS is what happens as browsers move to off-main-thread compilation more and start adding more time-consuming optimization passes as a result.

If you read the rest of my last comment, other than just that sentence, I explained in detail why that isn't at all the same thing, and isn't even close to as good. Just inlining a call with a guard around with a fallback dispatch in the uncommon case doesn't give any information to the subsequent code about which case occurred, and that defeats many, many optimizations, *especially* the elimination of subsequent checks.

[edit: oh, I see you read only my first comment. Please read my second comment, which explained why profile-driven inlining with a static compiler isn't doing the same thing at all]

You can do incredible optimizations in that case, but the best optimizations require subsequent steps being able to assume the result is the common case result, and you can't do that if you just compile a type test and a fork around an optimized common case and unoptimized uncommon case, because then that assumption can be incorrect. You have to be able to do vastly more difficult things like eliminating the uncommon case altogether, and reconstructing it as-needed by de-optimizing the current activation (which might actually be many, many source level activations inlined together), reoptimizing for the new situation, and restarting from the exact next instruction.

It sounds like you are describing deeply optimizing the fast path at a cost of making a change in assumptions a lot more expensive. I'm sure that trade-off is worthwhile a lot of times because most code is written as if it were statically typed even in dynamic languages. Nonetheless I fail to see how that kind of optimization could ever be equal to, much less faster than, the computer *knowing* what the type will be because the programmer took a few extra seconds to specify it.

You can do incredible optimizations in that case, but the best optimizations require subsequent steps being able to assume the result is the common case result, and you can't do that if you just compile a type test and a fork around an optimized common case and unoptimized uncommon case, because then that assumption can be incorrect. You have to be able to do vastly more difficult things like eliminating the uncommon case altogether, and reconstructing it as-needed by de-optimizing the current activation (which might actually be many, many source level activations inlined together), reoptimizing for the new situation, and restarting from the exact next instruction.

It sounds like you are describing deeply optimizing the fast path at a cost of making a change in assumptions a lot more expensive. I'm sure that trade-off is worthwhile a lot of times because most code is written as if it were statically typed even in dynamic languages. Nonetheless I fail to see how that kind of optimization could ever be equal to, much less faster than, the computer *knowing* what the type will be because the programmer took a few extra seconds to specify it.

The problem is that you are confusing interface types and implementation types. One of the major benefits of an OO type system is to allow you to make your code vastly more general by using interface types rather than specifying actual classes, etc. I.E. don't use types to lock-in implementations. Yes, it is true that there are some common types tied to implementations, like a C++ "int", which gives you optimization information, but those aren't objects, and code that uses them isn't polymorphic anyway. For actual OO code, i.e. objects upon which you can do virtual calls, well-written code uses interface types by default, not implementation types, which are almost always just poor design, except perhaps in a critical inner loop or something. And an interface type doesn't help you do much optimization, which isn't surprising given that one of the main purposes of an interface is to hide the implementation.

So 1) your statically-typed languages have type systems that don't help much for optimizing real-world OO code, and 2) my main point, which is that languages with a runtime that can do the kind of dynamic optimization I've described, have some important kinds of optimizations they can do that no static compiler can do, period, ever, because a dynamic VM can do a more aggressive class of optimizations. The more OO your code is, the more those optimizations matter. The less OO, the less they matter.

At the very least, both the Java HotSpot VM and Chrome's V8 use these kind of techniques, and eventually everyone will.

You can do incredible optimizations in that case, but the best optimizations require subsequent steps being able to assume the result is the common case result, and you can't do that if you just compile a type test and a fork around an optimized common case and unoptimized uncommon case, because then that assumption can be incorrect. You have to be able to do vastly more difficult things like eliminating the uncommon case altogether, and reconstructing it as-needed by de-optimizing the current activation (which might actually be many, many source level activations inlined together), reoptimizing for the new situation, and restarting from the exact next instruction.

It sounds like you are describing deeply optimizing the fast path at a cost of making a change in assumptions a lot more expensive. I'm sure that trade-off is worthwhile a lot of times because most code is written as if it were statically typed even in dynamic languages. Nonetheless I fail to see how that kind of optimization could ever be equal to, much less faster than, the computer *knowing* what the type will be because the programmer took a few extra seconds to specify it.

You're basically correct here. The key distinction to note here is not dynamically typed vs. statically typed, it's between strongly-typed type-safe languages which are runtime-introspectable (which includes javascript, java, C#, python, perl, etc.) and weakly-typed type-unsafe languages (the C family of languages such as C, C++, and Objective-C).

The former languages allow for advanced runtime optimizations because their semantics allow the runtime to definitely infer, and keep track of, various assumptions about the system - monomorphism/polymorphism at a particular call site, type constraints, shape models, etc. - as well as track changes and invalidations of those assumptions.

The latter languages preclude such optimizations because their semantics allow the programmer to break those kinds of inferences without there being a reasonable way for the engine to figure out that they've been broken. Unsafe pointer casting, unions, pointers-to-stack-objects, and other language features make it effectively impossible to optimize heavily without breaking the language semantics.

Given that, it is conceivable, at some point down the line, that javascript can be optimized to the point where it's faster than equivalent C code. JS runtimes could do advanced runtime range analysis, inlining, and DCE to produce code that is actually better than possible for the equivalent C or C++ code which cannot be effectively runtime optimized.

For languages such as Java, C#, etc., however, any optimization technique that could be used on JS or other dynamic languages would apply to these languages as well.

You can do incredible optimizations in that case, but the best optimizations require subsequent steps being able to assume the result is the common case result, and you can't do that if you just compile a type test and a fork around an optimized common case and unoptimized uncommon case, because then that assumption can be incorrect. You have to be able to do vastly more difficult things like eliminating the uncommon case altogether, and reconstructing it as-needed by de-optimizing the current activation (which might actually be many, many source level activations inlined together), reoptimizing for the new situation, and restarting from the exact next instruction.

What you say doesn't make sense - the dynamic VM still needs to check whether it's assumption holds, just as a static compiler would need to insert that check. Exactly the case you describe - where you just go with the assumption all the way through - is possible in a statically compiled case too, particularly if profile-guided.

Now, a type check - i.e. very predictable branch - is extremely cheap, much cheaper that the cost (largely implicit due to knock-on missed optimizations) of a virtual method call, so performance wise, it probably doesn't matter. But the distinction isn't a fundamentally new type of optimization either.

You're basically correct here. The key distinction to note here is not dynamically typed vs. statically typed, it's between strongly-typed type-safe languages which are runtime-introspectable (which includes javascript, java, C#, python, perl, etc.) and weakly-typed type-unsafe languages (the C family of languages such as C, C++, and Objective-C).

The former languages allow for advanced runtime optimizations because their semantics allow the runtime to definitely infer, and keep track of, various assumptions about the system - monomorphism/polymorphism at a particular call site, type constraints, shape models, etc. - as well as track changes and invalidations of those assumptions.

This distinction isn't as clear cut as you make it out to be.

Most of the introspectible languages have ways of escaping that safety; e.g. CPython's widely used native interop, java's JNI, or C#'s DllImport - or unsafe.

The unsafe keyword here is somewhat interesting, because it wraps unverifiable code in an explicitly marked block. But that marking doesn't need to be explicit. Indeed, exceptions are another kind of optimization nightmare and you'll find that implicitly, compilers (including C++) treat functions with exception-related complexity quite differently from those without exceptions.

Similarly, although in principle it's possible to cast and reinterpret various bits of memory in C++, making optimization harder, in practice people don't actually do that - at least very rarely. Thus for instance G++ can (and will!) assume that various pointers are not aliased because they have incompatible type annotations, and will use this assumption to optimize programs (this matters because it breaks programs that assume a pointer is just bunch of bits). So while C++ requires that pointers are fixed values you can do arithmetic with, it doesn't require that any pointer can be converted to any other (though there are exceptions).

Similarly, even though pointer arithmetic is allowed, most code doesn't actually use it in any tricky way, and a compiler could statically verify that code is safe with respects to some criterion and use that for optimizations.

And of course, runtime type information *is* available to C++ programs and has been for a long time. Sure, it's not as practical as in more dynamically-oriented systems, and it only tracks some objects - but it tracks exactly those where dynamic polymorphism is possible, so the missing information is in principle still there.

Quote:

The latter languages preclude such optimizations because their semantics allow the programmer to break those kinds of inferences without there being a reasonable way for the engine to figure out that they've been broken. Unsafe pointer casting, unions, pointers-to-stack-objects, and other language features make it effectively impossible to optimize heavily without breaking the language semantics.

Given that, it is conceivable, at some point down the line, that javascript can be optimized to the point where it's faster than equivalent C code. JS runtimes could do advanced runtime range analysis, inlining, and DCE to produce code that is actually better than possible for the equivalent C or C++ code which cannot be effectively runtime optimized.

To paraphrase: C++ precludes some optimizations because it's too flexible.

While this is true, the same is true of javascript. The fact that a variable can change type (but almost never does), is exactly one of those things holding javascript performance back in principle - but a smart VM can mitigate that greatly. And in the same way, a smart compiler doesn't need to let C++'s theoretical flexibility slow down practical programs that don't use all of that flexibility. Some of the slowdown can be mitigated.

Quote:

For languages such as Java, C#, etc., however, any optimization technique that could be used on JS or other dynamic languages would apply to these languages as well.

I'm pretty sure the single biggest performance advantage C++ has over java/C# are real templates. They make a night and day difference to performance. C++ is one of the few languages I know where abstract algorithms can be (practically) written to an "interface" yet still inlined and instantiated for a specific scenario and fully optimized as if you'd hardcoded the types (and constants, etc.) everwhere. And of course, C++ isn't the end of the story, there's D too, which may or may not take off.

(Aside: another contender for biggest perf advantage is proper support for values - references are expensive!)

Think of it this way: whereas fancy smart VM's might someday be able to specialize your generic (dynamic language) code so that it runs tuned just for your scenario, C++ lets you write most similar cases to an abstract interface with literally zero abstraction penalty. Indeed, it sometimes has a negative abstraction penalty (no joke) because the template authors can spend time doing some really finicky optimizations that you in practice won't do if you hand-tuned the code (e.g. controlling memory prefetching instructions, choosing between alternative algorithms, etc.).

In practice: In javascript, your custom implementation of map/reduce functions will almost certainly run slower than an explicit for loop implementing the same process. In C#, LINQ to objects is much slower than for loops. But in C++, STL algorithms are just as fast explicitly coded versions - because they basically are the same thing.

In the real world, C++ etc are pre-compiled languages that do *not* include a runtime that is anywhere near sophisticated enough to do the many kinds of dynamic-optimizations that are now done routinely for languages like JavaScript and Java. For example, inlining of the majority of virtual calls, which are statically polymorphic but dynamically monomorphic. That is the single biggest optimization done by those kinds of VMs (look up "type feedback"), and it is not possible using the kind of runtimes architecture that C++ invariably uses.

In the real world dynamic languages like JS are nowhere near as fast as C++.

And in the real world there are a lot of guys who predict that interpreted languages will very soon outperform native languages, because powerful runtime optimizations, blah-blah-blah. These prophecies are repeated almost as often as the end of the world predictions, and are as precise as the end of the world predictions.