@maysider Apple did create this processor. They have been a involved in ARM since the 1980s, in 2008 Apple purchased P.A. Semi, one of the largest purchases Apple has ever made, and they now have thousands of engineers working full time to design their custom implementation of the ARM specification.Reply

They actually did create ARM through their partnership with Acorn. It was their work to develop and deliver the Newton PDA that caused them to form a joint venture with Acorn and develop the ARM Architecture based on work that started at Acorn. It's the reason why ARM processors always worked so well for the handheld and embedded markets.Reply

Specifically here: The British company Acorn completely designed and produced many generation of ARM chips for their Archimedes RISC Workstation. At the time, ARM stood for "Acorn RISC Machines." When Apple made the Newton, they worked with Acorn to spin off their chip division and set it up as its own company, ARM, changing the acronym to Advanced Risc Machines. So Acorn made the ARM chip, and Apple made the ARM company. At this point, though, Apple COMPLETELY designs the internals of the chip, keeping just the ISA intact.Reply

They've got to be aiming for more than just a good competitor to Qualcomm. Either they literally want to eventually push ARM into the low end of their laptop range, or (much more likely) they see the iPad being able to someday replace desktops for a lot of use cases and are building the chip for it. I'm a little surprised that Samsung so far seems content buying Qualcomm chips in the US and using stock ARM cores in their own chips; in the long run, having one's own chip IP seems valuable.Reply

Apple is an investor in Imagination and so is Intel. Apple uses Rogue and Intel does the same for BT and subsequent SoCs. So if Apple would go with Nvidia and uses the Maxwell mobile core with Cyclone V2, then they could well be untouchable by Qualcomm or anyone else. This move is what I would call "Friends and enemies culling". Apple would kick Intel, QC and other Arm vendors and amass leadership in the SoC business to themselves exclusively, then segment their market into Hi-end, medium and low SoC to target pricepoints for both tablets and phones to match.It is a mean move. But will NV agree to licensing K1/M1 exclusively for 5 years or so ?. Hey, NV has been expanding the IVI market and growing it. This can mutually benefit Apple as well in CarPlay interface etc, so there is synergy in business. The mutual interest matches!.Reply

To be honest I'll believe in a working AND useful tegra chip when I see one.

So far tegra (every single goddamn one) has been a flop. Tegras 1 and 2 don't bear mentioning, 3 was slow albeit cheap and thus somewhat popular (mostly due to N7 2012). Tegra 4 barely got ANY devices due to bad design decisions (read: too power hungry and not particularly fast compared to competition) and is thus a flop.

K1 looks nice on paper but then so did Tegra4 and they had to make their own tablet and their own handheld to get any sales out of that one. I sincerely doubt Nvidia's ability to produce anything useful with K1 as well.

That's just not true, and I would have expected better from the comment section of anandtech. The only tegra flop was tegra 3, that was a year they messed up on the chip. Tegra 2 was powerful for the time and set the standard, tegra 4 is still a beast to this day, and tegra k1 appears to be continuing that while maximizing graphics potential and optimizing efficiency.Reply

You got the Tegra history pretty close. Tegra 2 was king for a while in mobile, then Tegra 3 was eroded by Snapdragon and was good on tablets at one stage (remember Asus Transformer series tablets). Tegra4 being late was a "no show" in the market and when it did, Snapdragons 600/800 ruled the mobile market and still does. K1 being a kepler design was full-featured as opposed to ULP Tegra4 so a radical departure in terms of feature support. Maxwell evolve that further and granularized the cores much more. It supports all the features of desktop parts but not as fast in most ops. So roll-out to product form is vital for it to get into mainstream. Since QC has a full lock on major OEMs, it is a tough battle for NV. Their venture into the vehicle market is stonger than others, so that might be new market territory NV wants. It becomes easier to be an established leader and maintain it. Reply

Sorry mate but it is true :) And as for "expected better". Well I could say the same to you.

Tegra 2 was good clock for clock. Shame it never clocked anywhere near the competition. So it was sold on mid-low devices. You know the ones known for "android lag". But yeah given that that's where android sales ACTUALLY are it sold a lot.

I'm glad we agree on tegra 3. Worst soc of the generation. Yet sold like hot cakes for the same reason as above. There are a lot of cheap droid devices on shit socs. Nexus 7 2012 is probably the best example. Although all those asus fake-laptops are a good example as well. They ran like shit but showed pretty effects if you played those 3.5 games that supported tegra effects :)

Tegra 4 is a beast that hasn't sold, benchmarked below all of its direct competition (its nice clock for clock again but who cares about that?) and gives the worst battery life of the lot. Oh and its hot. I wonder why they were forced to make their own-ish tablet? To shift stock most likely.

Going by the track record K1 is going to be a flop again. And going by the sales trend it will be present on like 2 devices. Both of which will be NV's own homebrew. But we shall see. Hopefully not - Qualcomm is getting boring.

Competitive on GFBench: Yep, it hardly has any pixels to push. The Shield uses a ridiculously low resolution screen so it scores better. It wouldn't be so great on a 1080p screen.Competitive on Javascript: Javascript is just a measure of the javascript engine. Doesn't mean anything when comparing CPU's unless they are both running the same javascript engine. Still, the CPU cores in T4 are competitive performance-wise.Competitive on Geekbench: Yeah, the cores are competitive.... IF you strap a big fat heatsink on them. Performance per watt? It is not good. If in a phone the battery life would be extremely poor.Reply

Resting the Surface RT's failure on the Tegra 3 is a bit of a stretch. Having slow storage, low screen resolution, abysmal ecosystem / product catalog, etc. might have had something to do with it. Geeks in forums like this will obsess over CPU/GPU, etc. but "the masses" do not.Reply

The Tegra lines weren't terrible, but they didn't live up to their hype either. With nVidia's GPU pedigree, people expected the Tegra line of chips to be best in class, at least in that regard. Considering they are using ARM reference designs for the CPU, this was their place to distinguish their products. The first 3 Tegra generations were "flops" at least in that regard. Tegra 4 is respectable. K1 looks like they finally understand this equation. Still, when you pre-announce products so far in advance, you get setup for a let down. Future tech from any company always looks good. How will the K1 Denver Soc compare to say Apple's A8? That's the chip it will be competing against, not the A7.Reply

Tegra 2 was a dog and there is little other way to look at it. Think Xoom. The Tegra 2 lacked the Neon with Nvidia hoping developers would hand code to their GPU. It hurt the overall performance for graphics and computationally intensive apps.Reply

If you look closer, the Tegra SoCs were and are one of the fastest and most efficient available for such a small die size. NV were the first to go quad core with a companion core. And the Tegra K1 is the first SoC to fully support OpenGL. So why doesn't it get used that much if it's so good?Because it lacks a modem! Requiring an external modem increases PCB size, part costs and most importantly: Power consumption. That's the one and only reason everyone goes with Qualcomm. Or why do you think do some Galaxy smartphones get sold in two versions with either a Samsung or a Qualcomm SoC?And would Win RT have been a success, so would have NV benefited from it, too.NV tries to adress the modem issue (see Tegra 4i) but it seems to be not so simple and takes a lot of time, and probably money, too.

Maybe that's the reason NV is (at least it seems so to me) abandoning the smartphone market (just as TI did) and focusing on game consoles (Shield, Ouya), embedded market (Jetson TK1, Powering the Tesla infotainment system, and future Audi, GM, Honda, ... cars) and high performance computing. In all these areas NVidia is much more successful than Qualcomm.Reply

Tegra 4 was not a flop on technical grounds. It is a very capable performer in tablets, with decent power consumption, when reasonably clocked. The Tegra 4 can certainly compete with Apples A7 in performance, and that is with a smaller die area.

It hasn't had too many design wins, mainly because it was a couple of months delayed, because the i500 modem was a year delayed, and because of bad experiences with the dissapointing Tegra 3 and NVIDIA hypefest in general.Reply

Seems more like wishful thinking or paranoia. Apple is a consumer electronics company that sells premium computers in several form factors. They are a luxury brand. They have no real interest in segmentation at the low margin price competitive end. Why would you assume that they have completely changed the culture of the company so much? Would BMW or Land Rover hurt their brand by building cheap cars that make low margins? Reply

Is there good reason to believe that Maxwell offers a better performance/power tradeoff than Imagination? Maxwell may well be a kickass architecture for 300W, but scaling it down to 3W doesn't mean it's now competitive with something designed to peak at 3W and to usually operate a whole lot lower.

Apple has always been a lot more interested in decent efficiency than in simply being able to boast that it can handle more pixels (or triangles, or MACs or whatever) than anyone else.Reply

Are you reading Anandtech article? The Maxwell article clearly states that it is design "mobile first", so the TDP target when designing wasn't 300W, but an order of magnitude below that at least.Reply

If Apple is so concerned about efficiency, why are they making large,complex, dual-core, out-of-order, "desktop-class" CPUs for their smartphones?

Because they prioritise single-threaded performance higher, which happens to have the largest impact on user interface responsiveness. The most power-efficient design is the octo-core cortex-a7. It has exceptional computational power pr watt.Reply

How many mobile programs (or even desktop ones for that matter) to a good job of multithreading? Exceptional computational power per watt because of 8 cores doesn't mean a lot if everything you're running is single threaded.Reply

I'm pretty sure BayTrail doesn't use a PowerVR GPU at all, they're using their in-house Intel HD Graphics system(the one that they put into their Core CPUs). It was their /old/ atoms which used PowerVR.Reply

Anything is possible, but it seems unlikely that Apple would switch to ARM for their Mac lineup. Having a device that can boot into Windows natively has been a marketing advantage and helpful for "switchers" that still rely on a given piece of Windows software.

More likely, I'd expect the iOS product line to become more "professional" (whatever that means) to the point where the capabilities between desktop and mobile product lines are blurred. I see iOS as being more of a forward looking approach to doing things, etc.Reply

I think the iPad gradually sneaking into desktop use cases is more likely to be Apple's hope. Then Apple gets the ecosystem control and 30% share they have on iOS, doesn't have to get legacy software ported or meet desktop expectations, etc. But I, too, think anything is possible. :)Reply

I hope this is a kick in the pants to Qualcomm, actually--the fact is there is no real competition for 8974 at the top of Android products; A15s just don't seem to fare as well in practice. I hope that the successor to 8974 is as aggressive Cyclone, and that Android OEMs are willing to pay for a larger, more-powerful SoC.

It's frustrating to me as someone who genuinely prefers Android to iOS--Cyclone actually lured me to the 5s for a few months, and I didn't even mind the small size so much; still, I found that I'm more comfortable in Android. Apple's vertical integration really does lead to good stuff on so many levels, but software flexibility is not one of them.Reply

Except as much as we may think, Qualcomm does not compete with Apple for CPUs. There's no pitch meeting to Apple to use Snapdragon, certainly not after they've invested hundreds of millions of dollars to create their own custom CPU architectures. And the sad thing is that the companies Qualcomm does or did compete with (nvidia and TI respectively), they've effectively shut them out due to shipping a well integrated SoC with a very modern cellular baseband attached. Reply

I would still say they aren't. Qualcomm needs to be profitable per chip, Apple only per phone. They can reliable depend on charging $100 for $8 worth of flash storage to pay for their R&D and larger silicon without worry. And keep in mind there are only so many high-end phones otherwise a chip like the Tegra K1 becomes to expensive. Nvidia Dell behind because they also don't offer the wide portfolio of updated chips and multiple performance/cost points Qualcomm does. Reply

Methinks that in terms of performance per watt or per dollar, Qualcomm only needs to compete with other shops that sell independently (mostly, to Android OEMs).

If Apple somehow tripled its benchmark performance while cutting power in half, they wouldn't sell THAT many more phones (unless they matched that feat with similarly unlikely price or feature-set changes).Reply

Just geeking out, a bunch of truly irresponsible speculations about what they could do, besides freq scaling, with the extra room provided by 20nm:

- True L3. Earlier, Peter Greenhalgh referred to what Apple has now as a write buffer, not a true L3 cache (which would include clean data evicted from L2 and prefetched data, I guess). They could change that.- RAM interface. 128-bit could come back. I guess LPDDR4 is coming someday, too.- More special-purpose silicon. Add something for compression (which AMD advertises as part of their Seattle ARM SoC plan). They could use that for compressed RAM a la Mavericks, to get more out of the physical RAM they have.- GPU, etc. Imagination's announced Wizard. Way out there, if Apple saw something cool they could do by getting into the GPU IP business themselves, they could: expensive, but probably less so than their ARM design odyssey.- Little cores. Could cut the base CPU power draw with screen on. Could do it to increase battery life, or to help cancel out any power-draw increase from higher peak clocks or whatever. Sounds like a sane idea, but relatively few SoCs doing it now, so hrm.

There would be many good reasons not to attack things like that next round, either to separate process and arch changes like Intel, or if their current big push is on something else entirely, little chips for watches or SoCs for TVs or whatever. But it seems plausible there are a lot of ways to improve the SoC other than a wider core.Reply

If Apple doesn't already use Mavericks-style RAM compression in iOS, that is some *seriously* low-hanging fruit they need to grab, since they're so heavily opposed to including more RAM in their devices. The linux kernel had experimental RAM compression for ages now (zram, since at least 2009) and I wish Android were using it. It seems to be an option in Android 4.4, but I can't find any mention of it actually being used on real-world device.

From a simply pragmatic viewpoint, Apple could just make the A8 a slightly-tweaked A7 that also happens to be quad-core, with 2GB+ of RAM. That would be more than enough to excite people, and with a proper hot-plugging algorithm, they could get all of the benefits of a dual-core chip and all of the benefits of a quad-core chip, depending on the software load it encounters.Reply

Especially because Apple insists on crippling their devices with insufficient RAM, which is something I've never understood. They're premium devices with premium prices, why cheap out on RAM? I get the feeling that many of the performance issues I suffer from on my iDevices are from lack of memory rather than lack of CPU performance.

Anyhow, because they're in that situation, they'd probably benefit from memory compression more than most. They're also in the unique situation of controlling the silicon, so they could implement a hardware memory compression engine if they wanted to...Reply

Apple is very aggressive about power management. DRAM burns power, so Apple installs as little as they think they can get away with. We might quibble with where they've ended up drawing that line, but the principle of it is sound.

Regarding memory compression à la Mavericks, I'm not sure that would be very effective under iOS. iOS will suspend inactive apps if freeing up some RAM becomes necessary. When an app is suspended under iOS its RAM contents are written to 'disk.' Since that's actually quite fast Flash storage, from which there's not much practical penalty in re-fetching those contents, any gain from memory compression instead would seem to be negligible.Reply

The issue is not so much performance as power. Which burns less power --- compressing a page or reloading it? (Writes to flash are very expensive in power, but iOS doesn't swap out to flash, it only swaps in, so that's less of an issue.)I could well believe that on current HW, Apple has done the measurements and concluded that compressed RAM is less power efficient overall; but that with custom HW added to the SOC this changes.Reply

Obviously, more RAM is better, but how exactly do you come to the conclusion that Apple is crippling their devices with insufficient ram? What job is running slow because of this? What type of program has not been brought to market due to this type of limitation?Comparing RAM requirements from different platforms like Android, etc. is missing the point. Apple doesn't have the overhead of a Dalvik virtual machine or the need for just in time compilers, etc. It doesn't need anti-malware services running, etc. The point being, just because 1GB doesn't cut it on Android doesn't mean it's a problem for iOS.As others have mentioned, more memory also consumes more power. It's very clear from Apple's designs that they are trying to optimize power consumption for their mobile devices. That's why you can get that much power in such a small and light weight device. The alternative is a much larger battery that makes for a heavier device and likely requires a larger screen as well.Reply

iOS doesn't use a swapfile. So there are no slowdowns that are RAM related. iOS will kill apps before getting into a OOM situation. (The OOM "errors" you see are iOS logging apps that it has killed to free up space)

Guspaz, your post demonstrates a fundamental lack of understanding of how memory management works in iOS. No, iOS doesn't suffer slowdowns from memory hungry apps. iOS doesn't use virtual memory, so there is no paging, etc. If it doesn't have enough memory, it kills other programs that may have been in memory to make room for the app you are trying to use. All iOS apps are designed to handle this gracefully. The point being, the OS was designed to be very responsive in low memory conditions. Further, when you switch back to an app that was killed, the state was saved so the end user never even sees the difference.Reply

Um, have YOU used iOS, Icehawk? Over the years I've used iPhone 3G, 4 and 5, as well as iPad 1 and 3. Out of these, the only one that experienced regular out-of-memory issues was the 256MB iPad 1. iPhone 4's 512MB of RAM may feel cramped today, but it was plenty for 2010 purposes, and my current iPhone 5 and iPad 3 do not feel RAM starved at all on iOS 7. Reply

Actually, they HAVE doubled the amount of RAM since those old models shipped. The iPhone 4 and 4S had only ½GB, while the 5, 5S and 5C have 1GB. The original iPad shipped with ¼GB of RAM, while the current model has 1GB.Reply

Its not a HUGE problem NOW, but Apple is basically doing it to ensure you buy another phone 2-3 years down the line, because 1GB of ram wont hold up for long. Their software stack dictates performance of the hardware stack in essence. If they code iOS 8 to use more ram, which will happen x64 duh, then 1GB RAM devices will suffer sooner than 2GB RAM devicesReply

I don't think Apple does that to ensure you buy another iPhone 2-3 years down the line, it seems too short sighted. After all, if your experience with your current iPhone is bad why would you get another one?

I think a more likely explanation is to maximize profits while also providing a quality user experience. From Apple's perspective, if there's no significant upside today to providing 2GB of RAM why would they?Reply

Regarding the GPU... take a look at the number of job listings Apple currently has for graphics hardware design. It's quite clear that they're intending to develop their own GPU IP, the question is what the intended product for it is - are they intending to replace Imagination in their smartphone/tablet SoC or is it for something else? Note that the only reason to get into it is to be able to better define their own power and performance targets - it's unlikely that they'll actually come up with a better architecture than any of the other players for quite some time.Reply

The way I expect the GPU will play out is basically the same as the ARM story. Apple didn't start from scratch --- they were happy to license a bunch of IP from ARM.I expect they'd similarly license a bunch of IP from Imagination and modify that --- but not as extensive modifications as with the CPU. Like you say, better matching their targets.

This could take the following forms:- better integration with the CPU (custom connection to a fast L3 shared with the CPU; common address space shared with CPU with all that implies for MMU)- higher performance than Imagination ships (eg Imagination maxes out with 3 cores, Apple ships with 4)- much more aggressive support for double precision, if Apple has grand plans for OpenCL that require DP to be as well supported as SP.Reply

I'd wondered about that approach as well, but I'm not clear on which levels Imagination licenses their graphics IP? Because I'd definitely agree that it makes sense to start out with that route and do comparatively minor customization of Imagination's design to better suit their targets at first. I'm just not certain if Imagination would actually let them or not - as said, I have no clue what their licensing model is like.

Regardless, Apple getting into the GPU design side as well is definitely an interesting development. Especially if they actually go fully custom sooner rather than later since then Imagination will pretty much be out of high-end customers no? Reply

Khato, I agree with you in terms of the reason to get into the GPU business. However, I wouldn't bet against them in terms of coming up with a better architecture than the competition. Many said that when Apple entered the SoC / CPU business. Should be interesting to see how this plays out nonetheless.Reply

Well, I'd disagree that Apple has designed a 'better' CPU than their competition. They designed it for a different target, but it's nothing special in the overall picture. The only point of surprise with respect to their CPU design is the pace of iterations thus far, though it's still unknown if such is just the result of a 'backlog' or if they'll actually maintain a one year cadence for major architecture changes.Reply

I think this article makes it clear that Apple set the bar higher than others with the current round. Sure, other vendors can put together 8 core chips of lesser complexity that will be faster under some MP workloads. I'm not sure that makes for a "better" CPU either. To your point, it will be interesting to see if Apple can continue at the pace they have in the past. Logically, if history has demonstrated they can, then we should have no reason to doubt they will in the future. Given the percentage of revenue that come from iOS based devices, coupled with the vast resources Apple has, one would expect that they will continue this pace. Also, given the differences between the A6 and the A7, it would seem that Apple has multiple hardware teams working concurrently on chip designs. That of course is just speculation on my part though.Reply

Ahhhh, but history thus far hasn't really demonstrated such. For all we know cyclone could have started development back when Apple acquired PA semi 6 years ago and been the primary focus of the core design team with swift being a simpler version that was supposed to be out the door much earlier than it was. I don't really expect that such is the case, but simply having two major core iterations in a row doesn't tell anything about the future pattern.

Regardless it'll be quite interesting to see where all the other players fall in terms of performance. We can make a reasonable estimate for Intel and vanilla ARM cores, but NVIDIA and Qualcomm? No idea if they'll follow Apple's path of high IPC with low frequencies to keep power in check or continue with the status quo. Both are perfectly viable and can end up with the same efficiency.Reply

The iPad replacing the low end laptop/desktops is more of a software issue than hardware at this point. On that note, for wide spread adoption into the corporate world Apple needs to reinvest into the server software market so that companies can better manage large iPad deployments remotely.Reply

I've always assumed that the goal is that one day, your phone or tablet wirelessly tethers with your monitor, keyboard and mouse and runs the regular, desktop version of OSX. You don't need a separate computer unless you're a gamer or professional who needs CAD or 3D or something intense.

The A7 is the first chip that seems to fit in with that realization. Reply

I've been under the same assumption, but I just thought of one possible side effect. If your phone is your only real computing device and you carry your phone with you everywhere, does that mean cloud storage becomes unnecessary?Reply

The Qualcomm in US Samsung phones is about the millions it costs to get approved by the big carriers who won't let a phone attach unless it has gone through strict validation. It can cost a lot of money especially considering the $25-$50 patent license penalty added on the 3G bands.

I think we will start to see Samsung chips in 2015 when they are LTE only and the carriers have VoLTE up and running which will start later this year. If they only need to validate the LTE they won't need to spend the money on proving each new chip has a quality legacy stack.

Qualcomm has already done the legwork and testing needed so as long as the process is long and expensive it is almost certainly cheaper for Samsung to use their chips.Reply

But at the same time, own your own chip IP is extremely expensive. Your chip R&D department also has to be on top of their game. In the end, for many companies, it's more worthwhile to just buy chip designs from other companies.Reply

All this has happened before. Read Sutherland on the Wheel of Reincarnation or Christensen on Disruption Theory.

There is a time when doing everything in house is optimal and, luckily for Apple, mobile is at that time. Basically, doing everything in house is a win when the gains from tight integration (for example in a smaller device, or lower power usage, or SW making optimal use of all the HW features) outweighs the costs of not being able to spread R&D expenses across the ENTIRE community of buyers, no matter what the company. With mobile we'e at that point and we're likely to stay there for a while, given that size will always be an issue, and that power seems likely to be an issue for a while.Reply

If I were Intel, I would be very scared. By 2016-2017 Apple will easily catch up to Haswell. And by 2020 Apple and hopefully ARM will match Intel's architecture. The only advantage Intel's left with are their fabs.Reply

ARM's advantages over x86 decrease as core sizes increase. Simpler decoders don't help much when each core is over 200 million transistors. Meanwhile Intel's fab advantage is growing, so if all other things are equal, Intel will have a sizable lead in performance and power. Plus, who will port all of the Wintel software to ARM?

Intel will not die before the desktop and laptop PCs die, and I don't see that happening in the next decade.Reply

Intel's can advantage is growing every year. This means that others like Apple will have to compensate in architecture. I am pretty sure ARM/Apple will find a way around the problem you are pointing out. (I am very bad at understanding architectures)Reply

Intel's Fab Advantage is in fact *Shrinking* every year. If we talk about Mobile SoC Intel will have its 14nm SoC early 2015. While TSMC will have its 16nm in 2nd half of 2015. SoC Design and Tools matters a lot in Power Usage.

Desktop and Laptop Wont die. It will continue to ship, but at a much slower pace because people are not upgrading as much. Gaming has now moved into console and mobile space. Most part of the PC upgrade cycle are now in 3 - 5 years. So basically Intel has to figure out its Fab utilization.

And No, there is no need to Port to ARM, because at this rate we will likely have more Software on ARM then they are on x86. Reply

In fact, the 'fab advantage' is woefully underdescribed by just transistor packing density. Not in the least because TSMC and Intel both use different metrics for their nanometer-numbers. Depending on who you ask and what you are doing the differences are larger or smaller. If anything, the intel fab advantage has nothing to do with the chip scale and more with the fact that intel has a toolchain that is perfectly matched to their process.

And designing for low power, high performance or some mix between the two is also a function of synergy between toolchains and processes. They use exactly the same silicon ingots, ASML machines and metallization for HPm and LP processes, it's just a different set of design rules and some (surprisingly small) process tweaks. But with dramatically different outcomes in attainable performance and (leakage) power.Reply

Intel has an awesome process, yes. But you make some unwarranted assumptions.Basically: is Intel a fab company that designs CPUs? Or CPU company that owns fabs?Because Intel's current business model is in trouble.

Designing an x86 is VASTLY more expensive and slower than designing a kick-ass ARM CPU (AMD'd CEO has given us numbers that confirm this intuitively obvious fact). To compete, Intel has to innovate at the pace of ARM, while charging ARM prices. It can do one or the other, but not both simultaneously. Right now they can charge enough for high end Xeons and mid-range Haswells to keep the system going, but every year they have to extract money by squeezing the high end a little harder --- there is no give at the low end. We all know where this game ends (MIPS, SPARC, SGI, Apollo, Alpha, ...).

Which means that at some point Intel has to ask what is their REAL core competence? If they're better at fab than at designing processes matched to the modern world, then they're be better off selling the fab expertise than the CPU design expertise.

Which is all a way of saying that, just because Apple (or almost anyone else) can't buy time in Intel's top of the line fab today, doesn't mean that will still be the case in five years.Reply

I thought intel's SOC's will be produced on the Latest Node? ie: 14nm they made the announcement 1 year ago. 14nm is beginning production in June and shipping August-September (This year) and that's with the delays factored inReply

They are. Airmont is Q3/4. Intel is actually not doing Broadwell for most desktop so they can allocate to to airmont. TSMC won't be able to do 20nm FinFET (which Intel had with ivy a while ago) a die the size of ivy (which is fairly tiny) until late 2015. Reply

Intel 14nm and TSMC 16FF both use dual pattern immersion litho. the M1 metal pitch is 64nm for both these processes. Intel has no significant transistor density advantage over TSMC. TSMC is also developing a improved version of 16FF called 16FF+. That process matches Intel 14nm in performance.

It is true that Intel got to FinFET first, and good for them. But one can turn that around and say that it's NOT a good sign that Intel's most kickass fabs produce something that is still not competitive in phones with what's produced by everyone else's fab.

Intel MAY have an advantage in materials. That's very much an area of randomness and luck. They seem to take materials research more seriously than their competitors (or at least they did in the past). But it's also possible that we've hit the wall on what we can do with materials (low-K dielectrics, better insulators, copper interconnect etc) beyond a few percent increases --- I don't know enough to have an opinion.Reply

TSMC 20nm and 16FINFET are based on the same BEOL . both use the same dual pattern immersion lithography. The major change is the FINFET transistor device instead of a planar transistor device. 16 FINFET brings 20% higher performance (at samepower) and 35% lower power (at same perf) wrt 20nm planar and around 10% higher transistor density. (slide 19)

Here's a hint - Intel's fab advantage is always shrinking according to the foundry PR. When has TSMC or Global Foundries or all the rest not promised a narrowing of the gap? Why precisely is this latest round of reactionary roadmaps going to be any different?

Want a sampling of how TSMC's 20nm is actually going thus far? Because we've actually had the first indicator for awhile now, namely that Xilinx announced shipment of their first 20nm FPGA back in November of 2013 - http://press.xilinx.com/2013-11-11-Xilinx-Ships-In... Now compare that to TSMC's 28nm process where Xiling announced shipment of their first 28nm FPGA in March of 2011 - http://press.xilinx.com/2011-03-18-Xilinx-Ships-Wo... - which was what, 9 months before consumer products started showing up? Either way it's a pretty clear indication that TSMC is on track for at least a 2.5 year gap between 28nm and 20nm... which is, at best, maintaining the gap with Intel.

Then, of course, there's still the matter of FinFET implementation. We'll have to wait and see whether or not they actually deliver on that promise. After all, recall when FinFET went on TSMC's roadmap? Not to mention everyone else's? Yeah, that's right, pretty much immediately after Intel's 22nm announcement. Same thing happened with TSMC's '16nm FF+' creation - another roadmap response to Intel with very little evidence that it's anything more than a PR move.Reply

Given that the size of an atom is about 0.3 nm, it seems unlikely to me that anyone's fab advantage can last too much longer. Eventually you hit the wall, and other factors such as quality of software take over.Reply

As of today, there are very few design wins where there was any question about whether an X86 or ARM architecture would be <b>considered.</b> Right now, the architectures have very separate ecosystems around them; with over half of all ARM chips going to low-power, extremely price-sensitive handset sales, Intel isn't even bothering to try. Soon, they may need to better protect the low end of their X86 business as ARM chips become competitive in Intel's current domain. But that'd inevitably mean lower profitability that I don't think they're ready to cede.Reply

Since this is Apple's own CPU, how is Windows support relevant? Apple could offer a version of OS X on ARM (I'm certain they have something like that in the lab), but I don't see any pressure for Apple to move their laptops to ARM cpus. Right now, there is not that much pressure to move away from Intel cpus (my new Retina MacBook Pro lasts a complete transatlantic flight and then some/a whole workday on battery).

Even though in principle you're right that ARM's efficiency advantage becomes smaller and smaller, two other advantages remain: (1) Apple could develop everything on its own schedule and (2) its own CPUs are cheaper (because the in-house CPU team doesn't have to make any money). And it seems that Intel has hit some snags with its cpu development. So I think this may be something Apple would consider doing in 5 years or so. Reply

Intel's fab advantage has stalled and allowing others to catch up a bit. 14 nm has been delayed till the end of 2014 with part of the first wave arriving in 2015. The move to 10 nm looks to be brutal for all involved and Intel themselves isn't sure of 7 nm mass production viability (they do have prototypes).

In the midst of these last few shrinks, there is going to be a move to 450 mm wafers. This too has been delayed by Intel. Though no fab seemingly wants to be first here as it requires reworking of production lines with expensive new equipment that'd cost billions to upgrade.

The one manufacturing advantage Intel does have is means of mixing analog and digital circuits. This is mainly for the analog side of a radio though in the mobile market this is critical. Intel still has some interesting packaging technologies ( http://www.anandtech.com/show/834/5 ) that have yet to be used in mass production. Lastly, Intel has made several advancements in terms of silicon photonics. Silicon photonics is the next big thing (tm) in the server market as it'll enable greater socket scaling. Depending on power consumption, silicon photonics has been suggested as a means to link high resolution displays directly to a GPU and this would have mobile applications.aI am surprised that Intel hasn't used their process and packaging advantages to innovate past the competition. For example, they manufacturer their own eDRAM for usage with Iris Pro graphics. Not only does this increase performance but it also saves power. So why not extend this to a phone SoC and use it to replace the low power DRAM? Reducing the complexity of the memory controller can be done as Intel would design both the controller and the memory, saving a bit of power here too.Reply

Why would you need to, if you're talking about a Mac running OS X? Apple has shifted architectures numerous times in the past, and each time the transition was pretty smooth. In fact, they did their first architecture shift from 68k to PPC without even porting most of the OS, they just built kernel-level emulation support into Mac OS and let it handle all the 68k code until they finished that port. Later, when they did the shift from PPC to x86, they used a userspace emulator, and that transition didn't even take as long.

If Apple wanted to do another transition, from x86 to ARM (and I'm not convinced that they do), there's no reason why they couldn't do so again. They control the entire hardware and software stack, and are even in the position of making architectural tweaks to their processor to facilitate x86 emulation if they need to.Reply

Except in those two earlier architecture changes, which I agree were preternaturally smooth, the new architecture was considerably faster than the previous one, so that the penalty of using an emulator didn't hurt so much. I don't see ARM getting ahead of intel like this.Reply

Agreed. But the advantage this go 'round *would be* (nb: hypothetical) that a large fraction of their code goes thru a single development environment that Apple controls, and that already spits out ARM/X86 variants.

Apple is *almost* in the position where developers wouldn't even have to know or care what the actual instruction set was for ANY of their devices; block diagrams such as Anand has guessed at above, plus the LLVM pseudo-assembly would be good enough. Already there, AFAICT, for iOS.Reply

Get ahead of? Sure, not so much. But do well enough to make it work? Sure. They have the advantage of all the OS-level stuff running natively, leaving just the userland stuff. As WaltFrench pointed out, they also have the ability to push people towards easy recompilation (the whole fat binaries thing). And since they'd be designing the whole CPU from the ground up, they can afford to implement features intended to improve emulation. They can add extra registers and instructions, or they can actually implement a chunk of the emulation in hardware.

There's a precedence for this too... ARM chips, for a while, supported Jazelle, which enabled them to run Java bytecode in hardware. They basically did binary translation on the Java instructions of one of the stages of the processor's pipeline. I'm not sure how far Apple could get without an x86 license, but Apple could probably use their patent portfolio to strongarm Intel. Also, their cash reserves are big enough that they have enough money to simply buy Intel, and while that's totally impractical (both because of the difficulty of a hostile takeover as well as the fact that that's not really how their cash reserves work), it gives you an idea about the resources they could dedicate to getting an x86 license.Reply

Having been a Mac user through the 68k -> PowerPC transition, I can say that the first generation of PowerMacs were only faster than their 68k predecessors when running native PowerPC code. It wasn't until systems using the 604e that emulation of 68k became noticeably faster than running on high end 68k Mac.

The PowerPC to x86 transition was far faster. The NextStep roots of OS X enabled multiple binary bundles, included both 32 bit and 64 bit implementations of the same architecture. Xcode could be used to compile code to all architectures into one of these bundles for ease of distribution. This enabled a quick transition. It also helped that OS X had full x86 binaries by the time 10.5 rolled around and was x86 only when 10.6 shipped. (In comparison 68k code wasn't completely removed from Macs until OS X arrived, 7 years after PowerPC hardware first shipped.)Reply

»There is no inherent advantage in the ARMv8-instruction set when it comes to power comsumption, nor is there any inherent Intel advantage when it comes to performace.«Sure, there is: on average http://anandtech.com/show/7335/the-iphone-5s-revie...">code runs significantly faster in 64 bit mode compared to 32 bit mode (on average about 10~30 %, for cryptographic workloads significantly more). This is free performance you get from just recompiling your code. (And yes, I'm aware it's not necessarily connected to 64 vs. 32 bit, but stuff like having more registers at your disposal and a cleaner ISA.) With that increase in performance, your device can race to sleep faster, and that should help performance. Reply

I agree. I have been saying 'Intel get an Arm license and build the best Arm SoC that ever exist!". That is the ONLY way for Intel to fight Apple or any potential Arm vendors rising (in unstoppable fashion). X86 has been a limiting factor as Intel cannot really squeeze much out of it while Arm has plenty of potential to tap as yet. Besides, Arm has better multicore implementation compared to the loose coupling of x86 cores. Arm in HSA ?. Arm interconnects in HPCs ?. Many variants has already been tried and tested in Arm, so risks in making one or few of those mainstream is going to do the industry a lot of good. X96 compatibility is a farce and a FUD!.Reply

As much as I love ARM (and would love to get to know the MIPS Warrior core), Intel shattered all expectations when they released Haswell -- and then again with Bay Trail. No one would ever have believed x86 could be so energy efficient and still high performance. ARM has potential yet, but Intel secured their future for the near-term quite well. They shored up their defenses against ARM with good, if a little belated, timing.Reply

No, they haven't, because it's as much an ecosystem argument as an architectural one: the main obstacle for Windows on ARM is the lack of applications, and the converse also holds in the non-Windows tablet market and phone market. Intel hasn't made much in-roads. Reply

Intel has an ARM license (at the architectural level to boot), and they used to be a huge player in the ARM SoC space. They bought StrongARM from DEC in the 90s, and the Intel XScale line of ARM processors were used in most smartphones and PDAs back in the day.

They got out of the business, although Intel does still fab some ARM chips.Reply

Intel got out of the ARM business so they could focus on Atom. Honestly, I am not sure if it was the right move. I could easily see a future where Atom goes away and they just use the Big cores scaled down, much like Apple has done here. Another interesting alternate universe would have Intel ditching Atom and building custom ARM IP...

Honestly, I think it would be cool to see a (desktop?) performance war between Intel and Apple on cpu's. I mean, just to have a performance war going on again at all would be cool.Reply

Except that Apple has effectively given up on the traditional desktop. They're into all-in-ones and small form factor devices. Even the new Mac Pro falls into this category as there is no expansion and in most configurations an upgrade means replacing an existing part.Reply

While the Mac Pro isn't the machine for me, I can still recognize the genius in the design. The integrated thermal core is an interesting design that allows for incredible cpu/gpu power without noisy fans, etc. As for expansion, with 6 Thunderbolt 2 ports, it's a matter of external expansion vs. internal. Likewise, the expansion exists whether you recognize it as such or not.Reply

Until Thunderbolt can officially accommodate a GPU without any bandwidth compromise like an internal expansion card, then the Mac Pro is crippled over the long term. High speed PCIe based IO is great for other use-cases (especially in laptops) but the GPU upgradability is important. That's the deal breaker for me.

The rest of the system isn't that impressive either. Only 4 memory slots offers less expandiblity than the previous generation Mac Pro. CPU upgradibility is also hampered with the new Mac Pro (though it wasn't officially sanctioned in the previous Mac Pros, it was rather straight forward to do).Reply

Agree that GPU would be an exception for Thunderbolt, though I believe Thunderbolt is viable for every other expansion use case. The fixit / breakdowns of the product show that both the CPU and the GPU can be easily swapped out. The only real problem is that it doesn't take off the shelf parts. Perhaps that's an opportunity for third party vendors. This system was clearly designed for heavy use of OpenCL, etc. Final Cut Pro X demonstrates that well enough.Again, it's not the machine for me either, but I do still see genius in the overall design. I suspect this type of design is the future. I'm not sure how much of a future these large towers really have. As an example, something like the Razer modular gaming design is another similar direction away from the conventional towers. It may take a few years, but this type of thing will likely replace the conventional towers we use today. Reply

Even though the CPU can be replaced with off the shelf parts, I'd call the procedure a bit more involved than simply 'easy'. The GPU's are a bit easier to remove but without any upgrade kits even announced, what good does that do? Nothing.

For OpenCL usage, what prevents an owner of the previous generation of Mac Pro from plugging in two GPU's to get the same effect? The only thing that prevents the old Mac Pro surpassing the 2013 model in OpenCL is that it only has two internal 6 pin PCIe power connectors. If you are willing to deal with an external power supply for the video cards, then the 2013 never surpassed the 2010 model's potential OpenCL capabilities. This will be particularly true when the next generation of AMD and nVidia GPU's hit the market later this year.

Thunderbolt can be used for bulk storage but with NVMe drives around the corner, it won't be the fastest solution. Native PCIe based storage is going to be damn fast at the high end which the old Mac Pro can use to their full potential.

I do like the engineering of the 2013 Mac Pro but it feels more like it'd be the hypothetical machine to sit between the Mac Mini and a new Mac Pro tower. In this regard, the only thing I'd change is have the video cards use MXM style video cards instead of the proprietary ones. This way there would be some potential avenue for upgrades for the warranty be damned DIY crowd and it'd re-use parts from the iMac.Reply

"for the warranty be damned DIY crowd" Does this crowd routinely spend £7000 on a Mac Pro? Would you want to carry out an unsanctioned hardware modification on such an expensive device which voids the support warranty? This is a professional workstation intended for companies and money-no-object prosumers to buy, they aren't going to be overclocking the CPU cores and buying aftermarket GPU parts on ebay...Reply

Ones they go for the desktop they have to compete with Intel on performance and fabrication, they exited that fight (that they waged through IBM and Freescale) when they switched to Intel cpu's. While they have a really big core, it takes a lot more effort to have the same type of chip around 4GHz rather than 1.3-1.4GHz, it's really to hard to even compete in the notebook space where we have Intel's chips with a base freq of 1.7GHz but turbo of 3.3GHz. They don't really want to invest in fabs to compete with Intel. What Intel has shown as the ARM chips gets more advanced is that there isn't really any penalty to have x86 everywhere. If they go back to ARM they need to invest in compilers and tech for ARM as well as x86, now they have basically chosen x86 for everything and don't need to support an ecosystem for other platforms with the same kind of resources.Reply

x86 will own the professional desktop market forever due to legacy lock-in. Almost every sizable business has some program as a part of their workflow that can only run on (x86) Windows, for which the source code is unavailable.

ARM already beat out x86 for the low-end, non-technical user. These users have, by and large, already switched to iPads or their Galaxy equivalents.

It's the server market where a lot of competition will take place in the future. Architecture isn't quite as important with an open-source LAMP stack, or with Java (though there will still be some servers that need closed-source x86 binary stuff). But Intel will have to keep on its toes in terms of price/performance/power to stay on top.Reply

This simply isn't true. If the company has lost the source code (that's just bad.) then the program's system requirements aren't going to be getting any heavier. There have been demonstrations of x86-to-ARM binary translators that only have like a 40% to 60% performance degradation. ARM chips will get powerful enough to do that kind of binary translation on the handful of legacy programs a company might need to support.Reply

More likely than the company has lost the code is that they are using a commercial package that is no longer maintained/updated by the original vendor. But that said, binary translation would work just fine in that case. In fact, we've even seen work in the past with older Mac software during each of the two major transitions they've done. There were a few older 68k apps that had stopped working on more recent 68k CPUs that started working again on the PowerPC with Apple's emulator. Reply

Apple isn't too far behind Intel in terms of IPC. Apple is floating in the Westmere range in terms of ICP. Impressive for a design that is used in a phone. The catch with the A7 is its low clock speed and lack of Turbo. A good portion of Haswell's performance is the high turbo frequencies that it can hit under a single threaded load.

The real threat to Intel is if Apple decided to move upward into the laptop or even desktop space. They seemingly have the talent to produce a design that matches the best of what Intel can offer. And Apple isn't afraid of a platform transition in the desktop space having done so twice in the past. (68k -> PPC and PPC -> x86)Reply

The problem here is that you can't just go on and try to scale performance by making the processor wider and wider. They're already going into an extremely (silicon) expensive route now by adding the largest possible caches and widest possible non-execution resources they can fit onto an ARM core. There is no room here - there is, but it's not worth it unless they say 'fuck it, we won't care about cost at all anymore'.

The next steps will really have to be architectural changes to the processor, moving it more in the direction of Intel/AMD. Wider (multi-module?) RAM interfaces actually feeding into the processor and GPU, not being divided down like they are right now (high synthetic bandwidth but disappointing practical bandwidth). Smarter branch prediction for better single core IPC. But most of all, they are probably just going to build a whole bunch of extensions over ARMv8 to fix the hoops the compilers need to jump through to get anything done.

Another big stickler would be power consumption. ARM, even the best implementations of ARMv8, is still a good half to whole order of magnitude worse in the performance per watt game, and it gets much worse at active idle. There are no proper hardware 'system agents' in these processors, all the power management is done in software which is absolutely horrid. I wouldn't be surprised if Apple be the first to foray into some analogy of desktop C-states, maybe even beating ACPI at its own game and making a more modern power management interface. Split power planes can finally be a thing, which allows them to take in more I/O onto the main die. PCIe, NVMe and proper DDR4L support?

Another thing they may try given the budget they have is, but this is just wishful thinking and I wouldnt ever dream of this happening, Apple integrating 4G and WiFi? The only reason I'm even saying this is because we are finally in a spot where 3G and 802.11a/b/g is not necessary anymore, so you are not dependent on the big players in the field anymore that give you their patent protection. 4G and 802.11n/ac are fairly easy to get into and make your own implementation. But yeah, that would be crazy. I'm sure they will squeeze every last cent out of their CPU/GPU team investments before they start thinking about wireless.Reply

I've always thought as soon as Verizon Wireless no longer requires CDMA for voice, Apple dumps Qualcomm, goes in-house with wireless, and integrates the cellular baseband in house. It always felt like because of timing and scaling issues, the IPhone had to use n-1 of the best chips Qualcomm had. Reply

I'm no expert, but I think you'd see rapidly increasing diminishing returns by scaling clock speed that high on a mobile architecture, even one as wide as this. There's a general theory (found somewhere on this site) that a particular CPU architecture can only scale within a power envelope of approximately an order of magnitude (1-10W, or 10-100W for example). It all depends how much power would be required to hit the clock speed. Reply

It really really depends on the workloads. The Intel CPUs have a lot of work in the Uncore to allow them to scale to such frequencies. The Intel CPUs also have a lot of work in the memory pipeline. As you increase frequency, you become more and more bottle necked on the average number of loads and store you can have outstanding. Most of the mobile CPUs haven't really worked the whole memory pipeline issue as much as the desktop and server CPUs. Reply

No because overclocking alone does very little. You would end up encountering massive bottlenecks across the entire chip that would make it appear as if over clocking is not doing anything at all.Reply

Here's a better way to say it.If you down clock an x86 to Cyclone speeds, how does the performance compare? The experiment can be done (just buy an i3 running at 1.3GHz and prevent it from turboing) and performance is comparable. So yay Apple. BUT

the fact remains that Intel is able to hit 4GHz and Apple is not. There are a few different aspects to this. One is process and (maybe) circuits --- physically running the core at 4GHz. Apple could probably do this if they wanted, though they might burn more power at 4GHz than Intel does.A second is that running your core three times faster means that delays for memory are three times as long. Reducing the cost of these delays is where most of the work goes in a modern CPU. You want a large low latency L3 cache, you want a memory controller that limits delays when you do have to talk to RAM, you want very sophisticated pre fetchers. Intel is probably to certainly ahead of Apple in all three areas, but there's no reason to believe that Intel has some sort of magic that isn't available to Apple once they start concentrating on this area.

The larger claims above like "would end up encountering massive bottlenecks across the entire chip" are misleading and unhelpful. Essentially what happens is that when you miss in cache, you fairly rapidly fill up the ROB and then you're blocked, unable to do anything until the memory is serviced. Since Apple and Intel have equally sized ROBs, they will hit this point at pretty much the same time on any cache miss. Where Intel probably have an advantage (for now) is- miss in cache much less often (better pre fetchers, larger faster L3)- while the ROB is being filled up Intel MAY (not necessarily, but possibly) allow for more additional in-flight memory requests to be generated, ie can support higher level MLP.

The next great leap forward in CPUs is to support enough out-of-order processing to just keep working while servicing a request to memory. There are a variety of proposals for how to do this with names like kilo-instruction processing and continual flow pipelines. The common idea is to have an auxiliary buffer into which you slide all instructions dependent on the memory miss until the memory returns, at which point you wake up these instructions. This is not at all trivial for a few reasons. One is that it is dependent on extraordinarily good branch prediction, since executing 1000+ instructions based on speculation is not much good if, around instruction 200 you speculated incorrectly. Another is that there are nasty technical issues surrounding how you handle registers and in particular register renaming after you wake up the instructions that were sleeping waiting on memory.

If Apple want to really show us that they've surpassed Intel, they'll be the first to ship a processor of this class. I could see it happening simply because they have so much less baggage they have to deal with --- they can focus all their attention on getting the hard part of KIP working, not on low-level crap like how this feature is going to screw up 286 mode or interact weirdly with SMM. Reply

Not at all. Go read what the ROB actually does. In particular, no, the memory controller, RAM throughput etc will NOT lock up.A large part of the goal of the ROB and all the Out of Order mechanism is to maximize MLP --- ie to fire up as many memory requests as possible all to be serviced during these periods while the ROB is full. The caches and mem controller have nothing to do with the ROB and its filling up or otherwise.Reply

Apple likely tunes any prefetchers for burst operations into main memory and heavily relies on the L3 cache. The SoC likes to put the DRAM to sleep when ever possible so the prefetchers would ideally be doing burst reads most of the time to maximize sleep time. This is not optimal from a performance perspective but rather power. At 4 Ghz, such tuning would be detrimental to overall performance given how many cycles main memory would be. It would also help explain the similarly sized ROBs.

I'd argue that the next leap forward for Apple's designs would be SMT. By being able to run two threads simultaneously (or just a quick context switch), work may still be performed while one thread stalls on a memory read. Staying dual core in a phone still makes sense from a power perspective if Apple can add SMT with such a wide design. Going wider would still make sense in light of wide SMT (>4 way), though this may not be power optimal.

I differ in that I see the next big thing from an architectural stand point will be out-of-order instruction retirement. Currently OoO execution still keeps the illusion of a linear serial stream of instructions. Breaking that has some interesting ramifications for performance. The only chip to do so was Sun (erm Oracle's) Rock chip that taped out but never shipped.Reply

with a pipe that wide, does this mean they'll try to get some SMT in there? they've stuck with dual-core, but if they want to be 'desktop' class, they're going to need more threads. A 6-wide pipe is a lot of silicon to just leave idle for most of the time.Reply

Yeah.. SMT and put such chips into servers. Power efficiency seems to be brilliant. I wonder which packing density they could achieve if they left all that mobile SoC stuff out, which a server won't need.Reply

SMT strikes me as an unlikely path. These are not throughput cores, and the additional complexity of SMT is a distraction at this stage. 2x SMT gets you about 25% improvement, and that's not going to change (because it's basically tied in to 2x threads means each thread gets half the cache, so memory traffic goes up).

What would not surprise if it comes in the future, or is already in place, is automatic throttling of the pipe to reduce power. The idea would be that either driven by SW and/or HW the CPU could switch to a lower power mode which is, say, 3-wide, and an even lower 1-wide mode. In these lower modes a bunch of other features would also shrink to reduce power (eg the register file shuts down some of its ports). One could imagine, for example, that the HW detects on average the throughput it's seeing and, if that shrinks below a certain level and the SW has told it it's OK, the assumption is we're spending all our time waiting on memory, so let's switch to 3-wide mode and stay there for the next million instructions.

There have been features like this in other CPUs. For example in power-saving mode some CPUs have throttled I-fetch which then throttles the rest of the pipeline. Even doing that (which is really easy) in a way that's informed by how many instructions are being retired per cycle would be useful in saving power. Reply

my theory is apple purposely put 1 GB of RAM in iPad's and iPhones to make the device get outdated faster. They knew they built a good cpu, Maybe to good, so they put 1gb of ram. which really hurts the ipad air's, and ipad mini retina's UI performance. They ui constantly dips below 30 FPS, and typing on my Air is very frustrating. sometimes there is a 2 second lag when hitting a letter on the keyboard.Reply

Love the investigative tech journalism you and your sources are doing here. Keep up the good work, you are now achieving the level of the old Byte Magazine under Tom Halfhill (with much less cooperation I might add from the companies and products being covered). Kudos to you.Reply

The rumors point to A8 seeing a move to quad core for Apple, I wonder if those are true. I think they could still improve single threaded performance, but the 2x gains will be harder and harder to come by, so maybe they will finally go the "throw more cores in" route. Reply

Of course Qualcomm got it wrong, four narrower cores vs. two wider cores, when most applications aren't even using 2 threads on mobile... But this is all irrelevant until I can get an Apple Ax chip in an Android/Linux device. Right now, S800 is more than adequate for me...Reply

Blame NVIDIA and Samsung. First to the market with quad core phones. Qualcomm was basically forced to move to Quad Core to compete in marketing. That's why the 8960Pro exists. It's basically a Snapdragon 600 without 2 cores + on die modem.

Snapdragon 800 as a result was designed with Quad Core from the get go. But, then the market moved to Octa Core. Thankfully, Qualcomm appears to be sticking with 4 cores for their flagship SoCs.Reply

BS. Qualcomm was sticking with krait without increasing IPC at all since 2010 till 2015. Their "rushed quad core" was the S4 Pro which was fudmentally broken. The S600 was basically a fixed version and S800 is just more of the same they have doe for the past 4 years (changing process node and higher clocks). This is just CPU, they do some really great stuff with their modems and GPU and other areas. Reply

Yes, but doubling core counts doesn't double performance, and having twice as many cores that don't do much also has a high cost in terms of silicon.

Apple's business model and the margins that go with it mean that a higher silicon cost is often acceptable. My guess is that their bigger concern is available fab capacity, at any price. Putting it another way, If a bigger chip means doubling their silicon cost, that might be acceptable. On the other hand, if it means they can only get half as many chips, and sell half as many iPhones, that's completely unacceptable.

Higher power consumption is also a complicated issue. Higher performance per watt is quite desirable, but the positive or negative impact is offset by the other participants in device power budget. On tablets, its generally the display, by a long shot. On a phone, the display is less significant.

It seems though that Apple seems to think that a big complex CPU offered sufficient advantages in terms of performance relative to the power consumption impacts.Reply

That's an amazing chip...so much better than I was expecting. I do wonder what it is Apple's planning on doing with them...

Also wish every iOS device since the third or fourth year had at least 2x the RAM. It's surprising it works as well as it does, but it's still horribly RAM starved. These current devices should have at least 2GB, and 4GB would be more reasonable than 1 IMO.Reply

More interesting is that there remains an awful lot that we don't know about the micro-architecture, and which may have room for improvement. This includes, for example,- quality of the branch prediction (including indirect branches)- quality of the pre fetchers- quality of the load/store pipeline, including things like how many stores can be queued up pending completion, how aggressively loads are moved past stores, and how fast misprediction is recovered

Personally I think big changes in any of these areas are for the A9. For the A8, what needs to be improved is the uncore: Apple-controlled GPU much more tightly integrated with the CPUs, low latency (and larger) L3 shared by CPUs and GPU, ring gluing these all together. Even better (but maybe too aggressive for this iteration) would be a much smarter memory controller using some of the latest academic ideas like virtual write queue.

If you want aggressive scaling, at some point we'll probably get a third core added. MAYBE even with the A8, IF Apple go to 20nm. It seems to me that more than 3 cores is unnecessary right now (heck, there are few circumstances where even three would be used, unless they've made some great strides forward in parallelizing web browsing and PDF rendering), and that hyper threading is a distraction compared to other things they can do to improve the core that generate more bang for the buck.

General frequency scaling is, I think, less essential and less desirable than better turbo support. Don't design the CPU to run for a long time at 2.5GHz (for either power or overheating reasons) but allow it to hit those frequencies in brief bursts of a second or less to provide for higher snappiness.

Also, of course, much less sexy but coming will be the M8, Apple's designed ultra-power-power sensor controller core (as opposed to the M7 which is a rebranded third-party core). It would be interesting to know something about that, but it'll be a long time if ever before we learn anything beyond a few crumbs. (Apple probably will boast at the iPhone6 release about how little power it uses, and in just how many circumstances it can do the job without waking the main core.)Reply

Why? Given the parsimonious way that Apple parcels out memory, doesn't allow you to keep more than one app open all the time -- somewhat changed right now, for background processes, audio streams and the like, and a stateful little stub to start up an app quickly, but not really "multi-tasking" like leaving some code to compile while you watch a movie and talk to your girlfriend-- then what would the extra RAM have done a) to speed and fluidity -- very little-- and battery life? RAM degrades battery quickly. Reply

This chip seems to share quite a bit in common with P.A. Semi’s POWER-based PA6T-1682M from a few years ago. The question that nobody seems to ask is how Apple beat ARM to the punch by almost 18 months in the high-performance arena. Nobody's that much better than ARM at ARM designs.

Two answers come to mind. The first is that Apple made the ARM64 spec and told ARM to use it otherwise Apple switches to MIPS or POWER. The second is that Apple has an existing design that can be repurposed. It so happens that this is also somewhat true (in the PA6T-1682M designed in '05). Perhaps both of these are true.Reply

I love when things start to come together:What was Bob Mansfield doing in semi conductors' "ambitious plans for the future"?Why is the A7 intensely overpowered for a phone?Why did Apple make even its pro machine the size of a table-top jug?What reason will Intel exist in teh consumer space in 5 years?Reply

" Looking at Cyclone makes one thing very clear: the rest of the players in the ultra mobile CPU space didn't aim high enough. "

Really? There is certainly a place for large, wide, out-of-order architectures, but ultra mobile does not seem the most natural fit. I mean, there is a reason that Intel cannot compete with Mediatek in mobile, and it is not because of lack of performance, or lack of 64-bit adressing.

If it was really true that the A7 was a desktop-class chip, it would be pretty ridiculous to put it inside of a smartphone.Reply

I find arguments akin to »this is too powerful for a phone« a bit misguided: if programmers have that much horse power at their disposal, they'll use it. And if it is only used to reduce input lag, it's a win for the user. Reply

According to battery life tests here at Anandtech and elsewhere, the A7 did not fare worse than the A6, so I don't see an argument for worse battery life. Sure, if you run the A7 at full kilter, perhaps it will shorten the battery life, but such a workload isn't what you'd usually find on a smartphone or a tablet. Your argument regarding die area doesn't make sense to me, the A7 is smaller than the A6X by about 20 % and only about 6 % larger than the A6. Yes, the A7 is manufactured as a smaller process node, but in terms of area, it's comparable or smaller. Moreover, Apple doesn't need to make a profit on its SoC. Also, die area has nothing to do with the 64 bit-ness of the CPU (the Cortex A53 is a comparatively small ARMv8 CPU core).

I don't see any indication that Apple has had to pay a penalty for making the A7 the way it is. I have the impression you're trying hard to make the argument that the A7 is not a good design (»large cost increase«, »increased power consumption«) and argue that it is not »desktop class«.

I'm glad Apple has stirred up the competition, I'm curious to see how the Denver-based Tegra K1 performs compared to the A7 and A8. Reply

There's an alternative direction Apple may be going: What's the hardware in those massive data centers it's building? No one really knows - only Apple's own software runs in there. They're certainly not buying off-the-shelf servers to populate their data centers - none of the big guys are any more.

So imagine a clean-sheet design for an Apple data center server. I see at least three technologies that Apple's already shipping that I'd look to apply: An A7 variant or successor for the CPU's (lower power, less cooling); fast on-board SSD as used in all recent laptops; maybe Thunderbolt for connection to disks.

Apple has a long history of using technologies in unexpected places. Laptop technologies were the basis of the Mac Mini, then generations of iMac desktops; some are figuring in the most recent Mac Pro. The rest of the industry initially missed this possibility - they viewed desktops and laptops as entirely separate product segments, each with their own standards (standard big box - mainly full of air for years now - for desktops; 3.5" disks in desktops, 2.5" in laptops; internal expansion of memory and multiple disks for desktops, limited expansion for laptops; different lines of CPU chips). By now everyone realizes that many of those distinctions stopped making sense a long time ago. (That's not to say there are *no* differences - you can afford to use much more power in a desktop than a laptop, so you can put in a more power-hungry CPU. What's new is that you differentiate in your designs when it actually matters, not to match some arbitrary 25-year-old standard.)

Maybe the next wall to be breached is that separating "servers" from other machines.

That's a very good point: Apple has been preaching to own as much of the stack as possible, at the very least the crucial parts. Since cloud services are becoming increasingly important, Apple should go the way of Facebook and Google, and make its own servers. There are some efforts in this direction (Caldexa, HP's project Moonshot), so maybe this is what Mansfield is up to these days. Certainly, Apple's focus on desktop class performance would enable it to become central to Apple's server architecture.

One of the big problems I have is persistent, distributed storage. All of the solutions seem rather pedestrian and only partial solutions to a bigger problem. Maybe this will be one area of innovation soon. Reply

I think you are putting the cart before the horse. You use components designed for one segment in another because they offer better economics, or capabilities.

So, Apple used mobile parts in desktops because they were more compact and power efficient.

Similarly, Google leveraged high-volume commodity parts to fill their datacenters because they were significantly cheaper, and also had a better balance between I/O, memory bandwidth, and CPU.

People are now looking to mobile market components for the datacenter both because of the economies of scale in mobile, and for power consumption.

Apple may well end up using Cyclone in the datacenter, but if they do, it won't be because they designed it with a focus on the datacenter, it will be because decisions that make sense for mobile also make sense for the datacenter. Performance per watt is an important consideration for devices designed to fit in a pocket, or to fill a datacenter. In Apple's case, success at the former is what pays for the latter.Reply

@JerryL Web/Cloud is one of Apple's weaknesses.Back in .mac/MobileMe days Apple used Sun servers/Solaris and Oracle stuff. The first huge data center for iCloud used HP and NetApp stuff + Windows Azure (= why it will never work).

Its unlikely that Apple suddenly would start to do its own ARM Servers to do stuff. I really wished that Apple started to use its own servers/software for key components like iCloud/Siri/Itunes but Apple don't sees that as their key business.---We will see a move to ARM architecture in the future. Per Mhz A7 is faster than Intels i7 ULV processors. Many IT people dont think that just 15 years ago RISC had 90% of the server market and a huge part of the workstation market. We had PPC, Alpha, IBM Power, X86, PARisc and a bunch more CPUs competing. This lead to "Moores law" where performance doubled every 18 month.X86 became fast enough, was cheap and had Windows. RISC was always faster, but Sun and other vendors lived in a world where they wanted 4000 dollar for a CPU. With AMD X86-64 even the server marked started to move to X86.

In 2006 intel released the Core architecture that killed competition between Intel and AMD. Since RISC vendors already was out of business Intel had no competition. Moores law died. In 2006 fastest desktop X86 was quad core 2.66ghz Core cpu. Today fastest desktop CPU is 3.5 ghz quad core intel. Less than 100% increase in 8 years. Even if we count desktop Xeons six core its about 150% increase.ARM that have competition even between different ARM vendors 2007: 416Mhz to Quad core/octa core 2.5ghz in 2014. Just Apple have had 4000% increase in performance in the chips they use.Intel 150%. ARM/Apple 4000%

Intel believe they don't have to release faster stuff since there is no other X86/pure Windows CPU. Instead of making faster stuff they have optimized profits. Now Intel does the same thing as RISC vendors did: charge 4400 dollar for a 20 core Xeon.

Look at graphic chips where AMD and Nvidia compete. They still follow "Moore's law". People complain that graphic chips cost 700 dollar. Its cheap!Intels 20 core Xeon is less than 500mm2. AMD/Nvidias highend GPUs are around 500mm2. Since a wafer cost 7500 dollars/28nm it cost AMD/Nvidia about 400 dollars to produce a GPU at 500mm2. The point: If Intel had competition they would be forced to sell 20 core Xeons at 500 dollars. Then Moore's law would be alive.

This arrogance from Intel will kill X86. And its not one day to early. Every IT professional that have used real 64bit chips with real 64bit OS and no BIOS knows what I talk about. The sad thing is that I talk about stuff that real computers have had for almost 25 years. (1990 64bit. OpenBoot/EFi is at least 20 years old. Pure 64bit OS at least 15 years old. X86 = 64bit extentions = why 64bit code on Windows is SLOWER then 32bit code. About 3% slower. No real 64bit chip have this fenomen. Not even Apples A7. Benchmarks shows that code compiled in 32bit/64bit gains 20-30% performce by going 64bit. I really hate the windows myth that 64bit is just more memory. ARM for example have had 38bit addressing for years. Memory is not the issue. Its performance.

I think its funny that it was John Scully's Apple that was one of the founding members of what later became ARM. Apple needed chips for their Newton. 15-20 years later these ARM chips generate the most money for Apple and save the IT industry.

BTW. Optimized will always be better. Apple makes its own chips since they can optimize it for its software. Look at A5. Apple used 30% of the die area for DSPs and stuff they wanted incl the noise cancellation for Siri. A6 had the "visual processor". A7 had M7. Imagine a desktop CPU where Apple can put anything they want.

Look at MacPro. Since Apple control the hardware and use a real OS they can dedicate a graphic card for compute. A Firepro have 35 times the calculation power than the Xeons in MacPro. This is what's going to happen with Apples desktop ARMs. Apple have 15 years experience using graphic for acceleration and SIMD/Altivec. Stuff that Windows still cant do since MSFT don't control the hardware.Reply

What is it that can be done on a "real OS" regarding compute that I cannot do on a "non real" OS such as Windows or Linux?

I have CUDA, I have OpenCL and on both OS X and Linux/Windows I have to design and optimize my algorithms for GPU compute, otherwise they would suck.

There is no magical pixie dust in OS X that would magically make the GPU and development environment more "compute friendly" than it is on other platforms. If somebody claims otherwise that is just load of marketing nonsense. If your algorithm does not scale or sucks at memory accesses on the GPU it is going to suck on both OS X and Windows/Linux. If not, it is going to go well on both.

And, guess what, if I want I can get a expandable system where I can stack 4 or more GPUs, where on the Mac Pro I am stuck with 2 of them tops. Not to mention the CPU side, where there are readily available Ivy Bridge EP 4S systems that can fit in the workstation form factor, while Mac Pro cannot do more than a single Xeon 2697 v2.

As for NVIDIA/AMD competing vs Intel Xeons, sure - look at NVIDIA's and AMD's "enterprise" GPUs, all of the sudden we are talking about mid-high $$$ figures for a single card, just like Intel's high-end Xeons.

This is because enterprise market is willing to pay high dollar for a solution, not because there is no competition. Server hardware for big data / medical / research was always costing thousands of $$$ or more per piece, the only difference is that every year you get more for your $3/$5/$10K+.Reply

First, it seems to me that Apple are at least one year ahead of Qualcomm, and Qualcomm are a year or two ahead of Intel, in mobile computing. It all seems to be about the level of integration achieved. Intel are having to use TSMC to deliver SOFIA. There will be a delay before Intel can achieve this level of integration using their own wafer fabs.Secondly, it is not all about the CPU. ARM have stated that 2+4 big/little cores makes sense in mobile. This is because most of the processing power is in the GPU. According to NVIDIA, this general idea scales all the way up to HPC; the GPUs provide the real processing power. Apples's A7 chip may well have 4 A53 cores, and use big/little; who, other than Apple, would know?Thirdly, there is no best CPU design; ARM emphasise diversity; they are not trying to do Apple's job for them. Every different CPU design is a different design point in the design space.Fourthly, the levels of integration now possible on SoCs, especially at the server level, are so great, and potentially so diverse, that there is no best SoC; they are just different points in the design space.Fifthly, the desktop market is contracting; this even applies to Apple's MACs. No company can base its future on a contracting market.Reply

Anand, nice work, ex CPU/IO system guy here so it's nice to see this sort of digging. Quick question. With 6 micro-op issuances (or is that decode) , what about resource interlock ? register collision. How smart is the compiler ?Reply

1- Is it possible that the processing is required by the Touch ID?2- Would on-board speech recognition require/utilize this processing power?3- How high could the processor speed go if heat were not an issue? (i.e. a different form factor)

I always hear that it's hard to max out the A7... not true. I'm using the iPad Air in my music production. You can connect different apps/synth together and play them at the same time. Well, here it's very easy for me to reach the limit of the A7 cpu power if you want. However, in most cases the RAM is the real bottleneck and i have a lot more low RAM issues/crashes on my iPad Air then on my iPhone 5 doing similar things. Since the 64 bit apps using more RAM it's feeling like a step back to not upgrade to at least 2GB of RAM. Even Safari can't handle much without a crash. It would be time for an iPad pro with more RAM and an advanced multi tasking. I know i'm working in a niche on iOS devices but sadly Apple seems not looking forward for "pro" users with their iOS devices and OS. I really hope this will change with the next generation of iPads. Maybe it's my fault to try "real" desktop skills on my iPad Air... the software is there but the hardware and OS can't handle it. Not that the A7 devices are not forward thinking but a lot of the third party developers are far more advanced here. Reply