I suspect Apple has more tricks up its sleeve than that however. Swift and Cyclone were two tocks in a row by Intel's definition, a third in 3 years would be unusual but not impossible (Intel sort of committed to doing the same with Saltwell/Silvermont/Airmont in 2012 - 2014).

Looking at Cyclone makes one thing very clear: the rest of the players in the ultra mobile CPU space didn't aim high enough. I wonder what happens next round.

This is one area where Apple really took everyone by surprise recently. When people talk about Apple losing its taste for disruption, they usually disregard the things they do not understand - such as hardcore processor design.

This is one area where Apple really took everyone by surprise recently. When people talk about Apple losing its taste for disruption, they usually disregard the things they do not understand - such as hardcore processor design.

Unless they can make it shiny or paint it gold, most of their users are not likely to care

There's truth in it. The fact that Apple didn't increase the amount of RAM is actually a problem - people are experiencing app crashes at a significantly increased rate, and this is almost exclusively due to out of memory errors

But, hey, 64-bit is a heckuva selling point if you don't look beyond it.

All this negativeness towards the 64bit processor reminds me of the same sort of thing happening 22 years ago when DEC came out with the Alpha series of CPU's. Sure there were 64bit CPU available but some of the competitors took to derriding 64bits while they frantically cobbled together their own 64bit designs.

Nowadays pretty well every X86 CPU sold into the consumer market is 64bit.

For those who keep repeating their mantra 'Apple don't innovate any longer', really should take the blinkers off. In my mind, the A7 is a big step forward. It answers many of the questions asked about 64bit cpu's in a mobile device. It isn't perfect but nothing is really.

Well, it 64-bit does come at a cost of more memory consumption. Unless it is packaged together with instruction set improvements like x64 and AArch64 is it will always be slower than 32-bit. So the only reason iOS has ANY benifit from 64-bit is not due to the 64-bit, but despite it.

Note that even though the transition from 32bit to 64bit address space was quite boring in x86 and ARM, it doesn't *have* to be this way: in the Mill project they use the 64-bit address space to have unified address space (but still with memory protection) which allow them to have to more efficient memory subsystem ( http://millcomputing.com/docs/memory/ ) and yes not be restricted to a tiny TLB is very important (think about the advantages for multiple processes).

I think that it would be great also if CPUs would use 64-bit pointer to allow you to have efficiently a tag field in the higher bits (which has benefits over using the lower bits as it is compatible with packed data(unaligned)) for tagging pointers (efficient GCs), tagging integers (efficient big-ints implementations)..

Well, it 64-bit does come at a cost of more memory consumption. Unless it is packaged together with instruction set improvements like x64 and AArch64 is it will always be slower than 32-bit. So the only reason iOS has ANY benifit from 64-bit is not due to the 64-bit, but despite it.

OK, the MIPS R4000 (64bit single chip uP) based SGI Indigo was public available several month before I had to sign a NDA to join a 3 day workshop on DEC Alpha internals. That workshop was held November 1991.

Alpha is perhaps not a very good example, since it got little traction in the marketplace (overall) and its associated development costs are one of the principal issues eventually leading to DEC's demise.

Luckily for Apple they mainly "innovate" with regards to marketing. Which in the end is the dept responsible for bringing home the bacon...

Actually it was costs that sank DEC. Right before their acquisition, for each $ of revenue DEC required over 300% overhead than Compaq.

Even after Compaq took over, each generation of Alpha was getting more and more expensive to design and manufacture. While its marketshare never grew fast enough to keep up with the rising production costs.

I don't think many people are acquainted with the economic realities of semiconductor/processor design. The tech sector is a business at the end of the say.

I don't think many people are acquainted with the economic realities of semiconductor/processor design. The tech sector is a business at the end of the say.

Very true. A "good enough" cheap commodity architecture will usually win out over an even better architecture lacking scales of economy. That's x86 in a nutshell, not the best processor design, but good enough.

If it weren't for the performance arms race the x86 processors found themselves in on the desktop side, I suspect x86 would have been "good enough" for mobile platforms too. However the notorious power inefficiencies opened up a large window for ARM to take hold of the mobile market, which to me is a good thing. I'd even like to see some desktop computers running on ARM processors, but that's a whole other barrier due to most commercial software being tethered to "wintel".

I wonder if the move to 64-bit was more about the wider design than other design considerations. ARMv7 already had 16 general purpose registers. While ARMv8 has double that amount, 16 is already a plenty, and all else being equal, the extra 15 registers would have a minimal impact on performance (Unlike the move from i686 to AMD64, which quadrupled the GPR count from 4 to 16. i686 was horribly starved for registers). The iPhone 5s doesn't add any memory, and 32-bit integer math is rarely a limitation for the type of stuff run on a phone.

However, all else is not equal: The A7 can issue about twice as many instructions as the A6 - the extra registers would be a boon for enabling the extra ILP, and that seems to be where all the A7's performance enhancements come from.

I always wondered why a 32 bit variant of x64 wasn't released with the same register advances, but without 64 bit capability. Perhaps that was benevolent forward thinking in case of x86, although it's hard to see a usecase for that in mobile. Ultimately I don't think that aspect really matters one way or another.

It's kind of sad that 64bit is seen as an Apple innovation though, that's just ARM's latest standard design. The innovation in the CPU has to deal with other details besides that aspect.

ARM achitecture is a description of registers, instructions and memory model. You can download architecture manual freely from ARM site.
32bit ARM architecture was created by Roger Wilson back in the 80s.

That Apple did in A7 is totally amazing. Even X-Gene (first 64-bit ARM processor on paper) is not ambitious enough. A7 is wider than latest Intel processors and has comparable internal resources like 192-entry ROB, massive buffers, also memory bandwidth is good (The problem of most ARM application processors).

I always wondered why a 32 bit variant of x64 wasn't released with the same register advances, but without 64 bit capability. Perhaps that was benevolent forward thinking in case of x86, although it's hard to see a usecase for that in mobile. Ultimately I don't think that aspect really matters one way or another.

Well, the instruction format would have to be changed to accommodate extra registers - the x86 instruction format uses 3 bits to encode either source or destination register, which isn't to select from additional registers.

To make a 32-bit chip with the other architectural enhancements of 64-bit, you'd still have to enter a different processor mode - say, x86+, to execute software that takes advantage of the extra registers and flat x876/sse register file (x86 uses a stack), and executing older 32-bit code would still require a mode change.

IIRC, adding AMD64 capability to the Pentium 4 (Well, technically Intel-64, the purposely slightly incompatible knock-off) only increased the die space by ~5% anyways, and by the time x86 was being dropped into ultra-mobile designs, well, the 4GB address limitation was already looming close in those designs. A x86+ design would have probably only had one generation of use...

Could anyone explain why went with a much longer pipeline stages design? I mean standard ARMv8 Cortex designs has 8 pipeline stages and apple A7 has 14. Would this not mean that when a faulty branch clears the pipeline it takes longer to refill, causing performance decreases?

Could anyone explain why went with a much longer pipeline stages design? I mean standard ARMv8 Cortex designs has 8 pipeline stages and apple A7 has 14. Would this not mean that when a faulty branch clears the pipeline it takes longer to refill, causing performance decreases?

This enters very "opinionated" territory

There's obviously a trade off between growing the pipeline to increase parallelism, and increasing the risk and cost of branch misprediction. The article suggests this architecture increases misprediction penalty from 0-19% and says nothing about the misprediction frequency (which depends alot on the software in use). The idea is for these negatives to be offset by having additional parallelism.

I think the compiler could probably do a better job at scheduling execution units even beyond the CPU's pipeline and with fewer mispredictions since the CPU is forced to do it on the fly. The compiler is far less constrained and should be able to do a more comprehensive analysis. The transistor savings by removing this complexity would result in less electricity or more parallel execution units depending on the way you want to look at it. Either way it's a win! However a pretty big problem with this is the way we distribute software in practice: generically precompiled and expected to run unmodified on different versions of a CPU. It would leave very little room for future CPUs to add execution units and for existing code to take advantage of them since scheduling is specific to a CPU model. Having competing CPUs would be problematic since code would be optimized for one or another, but not both at the same time.

One way to get around this problem is to distribute all software in an intermediary form and produce code which is always be compiled exactly for the target machine's execution units using exactly the right schedule. But for better or worse, CPUs evolved to the long pipelines we have now.