…and so forth. The separation into classes and helper classes are to commonify code, when possible, and to let the magic of template specialization select appropriate definitions of things at compile time.

The problem is that this worked a little too well. nsTArray_CopyElements<uint32_t> is a different class from nsTArray_CopyElements<int32_t>, even though both of them share the same base class and neither one adds extra functionality. This means that nsTArray_base must be instantiated separately for each element type, even though the behavior of nsTArray_base is completely independent of the element type. And so methods were being unnecessarily duplicated, which impacted compile times, download size, startup and runtime performance, and so on.

[A sufficiently smart toolchain could make this problem go away: the linker can recognize duplicated methods and functions at the assembly level, discard all but one instance, and then redirect all the calls to the lone instance. (Bonus question: why does it have to be done by the linker, and not the compiler? It’s certainly more effective, but there is a correctness issue as well.) MSVC calls this “identical COMDAT folding” and the gold linker on Linux implements a similar optimization called “identical code folding”. Indeed, we enable this optimization in the linker when it’s available on our release builds, precisely because it delivers a significant code size improvement. But in a cross-platform project, you can’t necessarily rely on the linker to fix up these sorts of issues.]

In our case, however, fixing the problem is straightforward. Instead of creating new classes to describe copying behavior, we’ll use template specialization to pick the appropriate class at compile time (the class that would have been the subclass of nsTArray_CopyElements in the above scheme) and use that class directly. Then we’ll have either nsTArray_base<Alloc, nsTArray_CopyWithMemutils> (the overwhelmingly common case), or some other specialization when array elements need special treatment:

Where did I say MSVC/clang are doing this automatically? MSVC and clang have the same problem with duplicating the methods (unless their inliners are even more aggressive than GCC’s, which I doubt, especially MSVC); MSVC just has a linker to cover up for it.

It’s also possible their inliners are doing something to be clever: not unifying the functions, because the functions have to compare different as function pointers, but making them near-unified: put one function’s code at an address A, then put the second function’s code at address A – 1, with a nop at the very start of the function. (This generalizes beyond two functions, of course.) Then you get code size improvements even if the linker’s not quite as smart, and the normal linking mechanisms don’t need any change at all, and you probably get icache improvements due to better locality and all.

That would be sufficiently clever, yes. I know GCC wants to do something similar for virtual function thunks, but there’ve been complications (“you want to point two names into the same function body? hahaha!”).