An Update on Apple’s A7: It's Better Than I Thought

When I reviewed the iPhone 5s I didn’t have much time to go in and do the sort of in-depth investigation into Cyclone (Apple’s 64-bit custom ARMv8 core) as I did with Swift (Apple’s custom ARMv7 core from A6) the year before. I had heard rumors that Cyclone was substantially wider than its predecessor but I didn’t really have any proof other than hearsay so I left it out of the article. Instead I surmised in the 5s review that the A7 was likely an evolved Swift core rather than a brand new design, after all - what sense would it make to design a new CPU core and then do it all over again for the next one? It turns out I was quite wrong.

Armed with a bit of custom code and a bunch of low level tests I think I have a far better idea of what Apple’s A7 and Cyclone cores look like now than I did a month ago. I’m still toying with the idea of doing a much deeper investigation into A7, but I wanted to share some of my findings here.

The first task is to understand the width of the machine. With Swift I got lucky in that Apple had left a bunch of public LLVM documentation uncensored, referring to Swift’s 3-wide design. It turns out that although the design might be capable of decoding, issuing and retiring up to three instructions per clock, in most cases it behaved like a 2-wide machine. Mix FP and integer code and you’re looking at a machine that’s more like 1.5 instructions wide. Obviously Swift did very well in the market and its competitors at the time, including Qualcomm’s Krait 300, were similarly capable.

With Cyclone Apple is in a completely different league. As far as I can tell, peak issue width of Cyclone is 6 instructions. That’s at least 2x the width of Swift and Krait, and at best more than 3x the width depending on instruction mix. Limitations on co-issuing FP and integer math have also been lifted as you can run up to four integer adds and two FP adds in parallel. You can also perform up to two loads or stores per clock.

I don’t yet have a good understanding of the number of execution ports and how they’re mapped, but Cyclone appears to be the widest ARM architecture we’ve ever seen at this point. I’m talking wider than Qualcomm’s Krait 400 and even ARM’s Cortex A15.

I did have some low level analysis in the 5s review, where I pointed out the significantly reduced memory latency and increased bandwidth to the A7. It turns out that I was missing a big part of the story back then as well…

A Large System Wide Cache

In our iPhone 5s review I pointed out that the A7 now featured more computational GPU power than the 4th generation iPad. For a device running at 1/8 the resolution of the iPad, the A7’s GPU either meant that Apple had an application that needed tons of GPU performance or it planned on using the A7 in other, higher resolution devices. I speculated it would be the latter, and it turns out that’s indeed the case. For the first time since the iPad 2, Apple once again shares common silicon between the iPhone 5s, iPad Air and iPad mini with Retina Display.

As Brian found out in his investigation after the iPad event last week all three devices use the exact same silicon with the exact same internal model number: S5L8960X. There are no extra cores, no change in GPU configuration and the biggest one: no increase in memory bandwidth.

Previously both the A5X and A6X featured a 128-bit wide memory interface, with half of it seemingly reserved for GPU use exclusively. The non-X parts by comparison only had a 64-bit wide memory interface. The assumption was that a move to such a high resolution display demanded a substantial increase in memory bandwidth. With the A7, Apple takes a step back in memory interface width - so is it enough to hamper the performance of the iPad Air with its 2048 x 1536 display?

The numbers alone tell us the answer is no. In all available graphics benchmarks the iPad Air delivers better performance at its native resolution than the outgoing 4th generation iPad (as you'll soon see). Now many of these benchmarks are bound more by GPU compute rather than memory bandwidth, a side effect of the relative lack of memory bandwidth on modern day mobile platforms. Across the board though I couldn’t find a situation where anything was smoother on the iPad 4 than the iPad Air.

There’s another part of this story. Something I missed in my original A7 analysis. When Chipworks posted a shot of the A7 die many of you correctly identified what appeared to be a 4MB SRAM on the die itself. It's highlighted on the right in the floorplan diagram below:

While I originally assumed that this SRAM might be reserved for use by the ISP, it turns out that it can do a lot more than that. If we look at memory latency (from the perspective of a single CPU core) vs. transfer size on A7 we notice a very interesting phenomenon between 1MB and 4MB:

That SRAM is indeed some sort of a cache before you get to main memory. It’s not the fastest thing in the world, but it’s appreciably quicker than going all the way out to main memory. Available bandwidth is also pretty good:

We’re only looking at bandwidth seen by a single CPU core, but even then we’re talking about 10GB/s. Lookups in this third level cache don’t happen in parallel with main memory requests, so the impact on worst case memory latency is additive unfortunately (a tradeoff of speed vs. power).

I don’t yet have the tools needed to measure the impact of this on-die memory on GPU accesses, but in the worst case scenario it’ll help free up more of the memory interface for use by the GPU. It’s more likely that some graphics requests are cached here as well, with intelligent allocation of bandwidth depending on what type of application you’re running.

That’s the other aspect of what makes A7 so very interesting. This is the first Apple SoC that’s able to deliver good amounts of memory bandwidth to all consumers. A single CPU core can use up 8GB/s of bandwidth. I’m still vetting other SoCs, but so far I haven’t come across anyone in the ARM camp that can compete with what Apple has built here. Only Intel is competitive.

I'm sure if Apple implements NFC plenty of people will use it. So many people have Apple devices and you won't need to guess or remember if you can send things to each other. NFC is a bit of a mess right now. For contacts and URLs it's standard and should work between any device. But the big use case --files-- is a mess. On Android you basically can only send files between devices of the same manufacture. WP has file transfer standardized but it doesn't work with Android.

Plus Apple users love to show people they have the latest Apple device and will love to beam things back and fourth. Reply

You accuse this guy of bias (which he obviously is, alongside being totally ignorant to one of the best tech reviewers out there (MKB does incredible videos). But you have no idea what you're saying yourself with some areas, and you very clearly have a bias towards the iPad.

Touch ID is the more of a gimmick than Haptic Feedback. I've used it on the Nexus 10; it makes a BIG DIFFERENCE in the feel of the onscreen keyboard. NFC is very useful if you know others with Android devices; I've used it on various occasions. Wireless charging is extremely convenient, especially if its on your desk and you use your phone on-and-off.

As for software; yes, the iPad hardware is considerably improved. But on the base level, the OS hasn't really been updated in years save for iOS7. That update added a lot of foward-facing changes, but not really too much functionality that hasn't been around already. iOS multitasking just got bumped up to be closer to Android, but still isn't nearly as flexible. Sharing is still difficult. You can't Bluetooth a group of PDFs to a friend (which I do weekly on my Android device), even!

Gesture Type. I CANNOT give this up. I can't stand typing on an iPad because the keyboard experience is so sub-par compared to my Nexus 10, and it will only improve with KitKat where you can swipe through the spacebar to combine multiple words in a single gesture.

You have apps, but so do Android tablets. There are many fantastic applications in every field that do their jobs admirably, and Google's set of tools are fantastic for writing, accessing information, sharing, and editing many formats of information. There aren't as many, but there are a lot of GOOD ones and even phone apps scale very well on a 10" display.

And the Nexus 10 2013 will very likely bring the most powerful non-Apple SoC to the table: Snapdragon 800. It matches or beats the Apple A7 in many areas, although it is defeated in others. Simply put, it is a VERY competitive chip with the A7 and, really, they are basically equal.

So please, stop bashing features that really DO matter to a lot of people, and I won't bash the iPad's lack of functionality (at a base level) when compared to Android devices out there. The iPad isn't the "perfect" device, neither is my Nexus 10. But we can't act like either is.Reply

IMO, wireless charging is pointless. Is it really that hard to plug it in? How do you wireless charge while you're using it? It's added cost and size for trivial levels of added convenience.

USB3 sync would be fine, if it actually sped things up. I'm sure they'll get around to it. Who doesn't wireless sync these days though?

NFC would be another "me too" for a bandwaggon that is already slowing down. Really, think about the few places where NFC is catching on, and tell me you'd really use a 10" tablet in those situations. It would be almost as bad as the iPad-as-a-tourist-camera people.

Here are some better ideas:

Relaxing the iron fist - Let us install our own apps from outside the app store. If Apple wants to sign them first and charge a nominal fee so that they can "prevent piracy," so be it. But I want to install my own stuff. I want iOS to be able to participate in things like the Humble Bundle. I want more iOS OSS.

Location Spoofing. Let us set location services to lie to apps temporarily. This is useful for a variety of reasons ranging from development to privacy.

Home screen icon sizes. No further explanation needed.

Put the good camera on the front. Nobody should be using the rear facing camera in most situations, but you want good low-light performance in FaceTime and Skype. If they could figure out how to center the camera in the middle of the screen through some optics magic, that would be incredible.

Front facing speakers....

Most of this boils down to just making the thing something I don't feel like I need to jailbreak. It's hard to improve a device that is almost perfect.Reply

"Location Spoofing. Let us set location services to lie to apps temporarily. This is useful for a variety of reasons ranging from development to privacy."

Developers can already do this, so what you are really asking is for privacy reasons. And in that case, Apple allows users to turn off location services on a per app basis already. I can tell you already that Apple is not going to allow users to spoof location data to any apps -- either you give an app accurate location data or no location data. Anything else puts their relationships with developers at risk -- for example, MLB would almost certainly pull their app if users could say they were in a different location since they wouldn't be able to enforce the blackout rules (I personally hate the blackout rules, but since they are legal agreements, MLB has to abide by them).Reply