ARM’s Eagle has landed: meet the A15

ARM adds another rung to the Cortex family's performance/complexity/power …

Just as products based on ARM's much anticipated Cortex A9 are finally poised to hit the market, the company has announced yet another, even higher-end core design: the A15. Codenamed "Eagle," the A15 architecture is ostensibly aimed at netbooks and tablets, but a look at the spec sheet leaves no doubt that ARM is absolutely gunning for the server market that Intel and AMD currently dominate. Indeed, even going by what little ARM has revealed about the A15, it's very hard to imagine this thing in a smartphone when it launches at 32nm in 2012 or 2013. This is a laptop and server part, and ARM will use it to take the fight to x86.

The overall position of the A15 in ARM's lineup is as the next logical step up the ladder from the A9, so that in order of core size, complexity, performance, features, capability, and power consumption, the Cortex family goes by the numbers: A8 at the bottom, then A9, then A15 at the top. The A8 is an in-order part with a very simple architecture that's comparable to Intel's Atom in some key anatomical respects, but is much lower-power and more efficient than the latter. The Cortex A9 is an out-of-order part that brings the ARM line into Atom's performance territory, and also closer to Atom in power draw (though the A9 is allegedly still much more efficient than Atom).

The A15 takes things to the next level by pairing the out-of-order nature of the A9 with the expansive feature list (virtualization support, double-precision floating point, ECC cache) that characterizes the Atom line, and that costs Atom in terms of power. In this respect, the A15 looks somewhat like AMD's recently unveiled Bobcat core, to the point that I would be very curious to see a comparison of the transistor counts of the two cores.

ARM has yet to release a real block diagram of A15, so there are many crucial aspects of the design that are very unclear. Specifically, the capability and robustness of the A15's branch prediction hardware, the depth of its pipeline, and the size and configuration of its instruction window will all have a direct impact on both raw performance and the all-important performance/watt metric. If ARM has beefed up the size of the instruction window and/or increased the design's branch prediction resources (possibly to mitigate the effects of a longer pipeline), then that will be performance positive but power negative.

Not a successor to A9, but a big brother

The fact that A15 is such a significant step up the performance/complexity/power ladder from A9 makes it clear that A15 isn't really a successor to A9, any more than A9 is a successor to A8. Rather, the three designs will coexist in ARM's lineup for many years, enabling the Cortex family to span a range of power/performance/feature points. So the difference between the A8, A9, and A15 parts, and, say, Bobcat and Bulldozer, is that ARM, being a much smaller company than AMD with very limited resources and a different historical user base, has trickled the three designs out slowly over the course of a few years, vs. putting them out them all at once.

The leisurely pace of ARM's roadmap will come as a surprise to Intel and AMD watchers, but as with all embedded designs, ARM architectures historically have very long lifecycles. (This is true of MIPS, as well.) Older ARM cores that aren't part of the Cortex family are still in production in a range of devices, from feature phones to appliances—the same will be true of the Cortex family in a decade.

Forget netbooks: this is a laptop and server part

Let's run down some of the important distinguishing features of A15:

Clockspeed of up to 2.5 GHz

1, 2, 4 or 8 cores

Support for 1TB main memory

ECC L1 and L2

Fully cache coherent bus protocol (for multisocket systems)

Support for virtualization

When you add the above to some of the other features, like double-precision floating-point, vector extensions, cache snooping, and so on, it's very clear that this is at the very least a laptop part—not phones, or netbooks, but laptops. And not just laptops, but servers. And then again not just servers, but high-density multisocket servers.

So with the A15, ARM finally makes the jump all the way into niches currently occupied by Intel and AMD. With respect to netbooks—or "smartbooks" as they're called when they have an ARM part in them—the line between a netbook and a laptop isn't super clear anymore. But wherever the line is, anything based on A15 will be on the laptop side of it. And in the datacenter, the part will also strengthen ARM's burgeoning presence in the nascent market for high-density, low-power cloud servers. Over the next two years, the A9 will establish ARM's presence in this new niche, so that when the A15 drops the company will (ideally) use it to build on its existing presence.

All told, the A15 looks like a promising complement to the A9 and A8, but it's too early to say much more than that. The part is well over a year from the market, if not two or three. And when it debuts at 32nm Intel will be at 22nm, and with no telling what kind of successor to today's Atom. AMD will still be working with the basic Bobcat technology, and will probably be at 32nm. This means that all we can do is make vague guesses as we wait for more A15 details to trickle out. But despite the paucity of detail about A15's real prospects, one thing is for certain: the processor scene just got a lot more interesting.

What about the ARM 64-bit extensions? Isn't the A8 32-bit only? What about the A9? I assume that 1TB of memory means that it's at least some higher-than-32-bit addressing capability. Nehalem/Westmere is 48-bit addressable, I think? I wonder what ARM opted for in the A15.

Oh, hello! So, that's a 40-bit addressable integer unit which is where they get the 1TB of memory from. Interesting.

So, now I wonder how the ARM code generator in GCC compares to the x86_84 code generator.

The address extensions resembles PAE. However, it is mainly a virtualization extension. That is, individual processes will still see a 32-bit virtual address space but the hypervisor can see the entire 40-bit space and be able to arbitrate the VA to PA translation such that they don't overlap eachother.

The overall position of the A15 in ARM's lineup is as the next logical step up the ladder from the A9, so that in order of core size, complexity, performance, features, capability, and power consumption, the Cortex family goes by the numbers: A8 at the bottom, then A9, then A15 at the top. The A8 is an in-order part with a very simple architecture that's comparable to Intel's Atom in some key anatomical respects, but is much lower-power and more efficient than the latter.

If the A8 is at the bottom, where is the Cortex-A5? Or the Cortex-M series? IMO the A8 is square in the middle of a very large family of processors designed to cover almost all conceivable niches except high end single core systems.

The address extensions resembles PAE. However, it is mainly a virtualization extension. That is, individual processes will still see a 32-bit virtual address space but the hypervisor can see the entire 40-bit space and be able to arbitrate the VA to PA translation such that they don't overlap eachother.

So, that kills off the possibility these chips making in-roads in HPC. Sure, 4GB of addressable memory is plenty in most applications but there's no way that will fly in HPC applications where simulation memory is a constantly growing requirement.

only if windows did an arm port of their os then maybe they could be competitive with intel. until apple or microsoft adopts the architecture then arm will always remain on smartphones and embedded devices.

that said i would love a laptop with arm as the cpu, tegra as the gpu and ubuntu or fedora as the os.

The overall position of the A15 in ARM's lineup is as the next logical step up the ladder from the A9, so that in order of core size, complexity, performance, features, capability, and power consumption, the Cortex family goes by the numbers: A8 at the bottom, then A9, then A15 at the top. The A8 is an in-order part with a very simple architecture that's comparable to Intel's Atom in some key anatomical respects, but is much lower-power and more efficient than the latter.

If the A8 is at the bottom, where is the Cortex-A5? Or the Cortex-M series? IMO the A8 is square in the middle of a very large family of processors designed to cover almost all conceivable niches except high end single core systems.

Ya, pretty much the A9 was squarely aimed at replacing the A8. And with scalability from single-core to 8-core, the A15 will really replace the A9. Power consumption numbers to the A9 at its targeted process are actually below that of the A8 normalized to performance. So really, it is meant as a solid replacement.

From what I've heard, the A5 even has a considerably better FPU than the A8.

One of the huge and often overlooked pitfalls with the A8 is that the standard VFP instructions are nonpipelined. Using them is worse than using x87 instructions on a P4 or an Atom. There's a workaround by using only half the VFP/NEON register space and issuing 2-way NEON vector floating-point instructions, which *are* pipelined, but GCC doesn't know how to do that yet - and LLVM (which does) isn't yet mature enough for production code on ARM.

The A5 and the A9 have brand-new VFP units which are properly pipelined.

only if windows did an arm port of their os then maybe they could be competitive with intel. until apple or microsoft adopts the architecture then arm will always remain on smartphones and embedded devices.

that said i would love a laptop with arm as the cpu, tegra as the gpu and ubuntu or fedora as the os.

The embedded market is a lot lot larger than the fairly small pc market.

Apple like for Ios in the iphone , ipad and ipods (which all use ARM based chips) ?Or microsoft in the Zune.

Tegra/tegra 2 is nvidias all in one arm based chip + gpu in the one package

So, that kills off the possibility these chips making in-roads in HPC.

I'm pretty sure nobody ever talked about using ARM cores in HPC environments. The whole idea behind HPC is, you know, high performance. ARM cores are high efficiency, low power parts. There has never been anything high performance about them, other than relative to the particular application in which they are used. I guess I'm a little confused by your comment because it seems out of the blue to me.

You are right about the physicalization, though. It will be interesting to see the systems designed around these chips. Innovation and competition make everything more interesting!

No, indeed HPC customers are always worried about performance/watt. Remember, they have to pay the power bill for 3+ years for these systems so it's a very real concern. There have been plenty of people trying to play off this. For example, there was SiCortex trying to play on the many-core MIPS architecture they had (now belly-up) and don't forget about NVIDIA is still trying to play the power efficiency numbers to promote their Tesla product line.

So, it *could* be a market ARM could play in--especially since almost all of the academic HPC code out there is just a re-compile away from running on a new architecture (very little binary-only code). But, yea, that's not going to happen with A15, clearly.

Given that an ARM processor is obviously capable of doing UI stuff, I wonder if it could scale to doing UI stuff at, say 1900X1200 dpi? I imagine a situation where a Mac comes with both an Intel chip and an ARM chip, and all UI threads are handled by the later, while all computationally intensive stuff is handled by the former. Given that the Intel CPU can sleep and wake on a moments notice, you could have a machine where mousing about and doing web stuff could be ULTRA low power draw. Given that probably 50% of most user's time is spent with that, the Intel chip would be asleep a lot. Even on an iMac the green cred would be nice, but on a laptop the impact on battery life might be substantial.

You wouldn't want to use two different instruction sets. It may probably be possible with fat binaries etc but scheduling etc would be very difficult and a very very big performance hit and extremely unlikely to be done these days.

only if windows did an arm port of their os then maybe they could be competitive with intel. until apple or microsoft adopts the architecture then arm will always remain on smartphones and embedded devices.

that said i would love a laptop with arm as the cpu, tegra as the gpu and ubuntu or fedora as the os.

I don't know about this; I think there's definitely a role for Arm in the mainstream server market, where Windows isn't necessarily a shoe in, if they can provide an increased performance-per-watt ratio over the available x86 offerings. And, in fact, ars itself has made the point that there's not necessarily anything particularly desirable about an Arm port of Windows anyway:

I do think ARM is shooting themselves in the foot by not going directly to 64 bit. The PAE-like extension is nice for memory capacity but I have a feeling that it'll haunt the architecture down the road. Virtualization is nice but I hope they implemented it in a forward thinking path.

To what extent does such a powerful ARM chip offer heat/power consumption advantages over x86 processors? Would I be correct in understanding that ARM is inherently a cooler, more economical architecture?

is anybody else thinking that in 3 years time this will be be too low powered for the server, as in computing power... cellphones maybe but 3 years is a long time in tech... if it performs like a medium-high end laptop today i doubt it will be much more than a medium end net-book like processor come three years

is anybody else thinking that in 3 years time this will be be too low powered for the server, as in computing power... cellphones maybe but 3 years is a long time in tech... if it performs like a medium-high end laptop today i doubt it will be much more than a medium end net-book like processor come three years

3 years from now server processors aren't going to be much more powerful then they are now. They'll just be physically smaller so you'll be able to fit more of them on one die.

Yeah, this is a great idea and all, especially if we can get up to 16 cores (4 groups of 4) per chip, but if it takes 3-5 years to come to market like Cortex-A8/9 then what's the point? By that time Intel will have the next die shrink after 22nm out, with who knows performance. I'd love to get my hands on a 16 core portable computing device with virtualization and a decent amount of RAM, along with nice coprocessors that did AES, MD5, and zlib and/or lzma compression. Also given the GPUs they put with these devices, outputting to multiple HD monitors typically isn't a problem.

The second I can buy a (micro/mini/regular) ATX mainboard with a socket for an A-15 and an A-15 with integrated heat spreader to go with it.. I'm going to be very tempted. I mean, hey, if it runs Ubuntu, why not?

I have mixed thoughts about a windows port to arm, as mohaine wrote, windows would be of limited usefulness without apps, its of limited usefulness now with apps. i could see some of the bigger software houses developing and maintaining products for both x86 and arm, but i dont see smaller houses getting excited about it, i could be wrong tho.

I guess theres a possibility that some software developers could say to hell with it and develop software for arm only if it hits the laptop scene and be done with microsofts rules and taxation.

i dont know, it hurts my head to think about it right now, so i think i shall stop.

Cortex-A for a server? Ehhh. I'm not going to be coding that OS, and I don't think anyone else is giddy enough about the idea to do it either. I'm not on board with the idea of running a server in a JVM. I think it's safe to say that the server market is strictly closed to the A15 until ARM writes software competitive with the established players.

No, indeed HPC customers are always worried about performance/watt. Remember, they have to pay the power bill for 3+ years for these systems so it's a very real concern. There have been plenty of people trying to play off this. For example, there was SiCortex trying to play on the many-core MIPS architecture they had (now belly-up) and don't forget about NVIDIA is still trying to play the power efficiency numbers to promote their Tesla product line.

So, it *could* be a market ARM could play in--especially since almost all of the academic HPC code out there is just a re-compile away from running on a new architecture (very little binary-only code). But, yea, that's not going to happen with A15, clearly.

IIRC both SiCortex and the IBM Blue Gene are 32-bit. So if someone wants to try to redo SiCortex but with ARM instead of MIPS the A15 doesn't seem that bad. 32-bit is for the time being quite fine for MPI apps on low-power cores; the problem is that with increasing numbers of cores per node people are starting to look into hybrid OpenMP/MPI schemes, and then at some point you'll run out of 32-bit address space.