The Future of Mobile CPUs

In the first part of our series, we explored the major trends that will influence the mobile system-on-a-chip (SoC ) market over the next five to ten years. This sets the backdrop for looking at the architecture for future SoCs and the specific players within this market, both critical IP players as well as the actual SoC vendors. For the most part, this focuses on mid-range to high-end devices, rather than the lowest-end smartphones and tablets. This means that some SoC vendors have been omitted, for the sake of clarity and brevity.

SoCs today

The vast majority of smartphones today are single- and dual-core SoCs. At the very high-end, there is a smattering of quad-cores. The same is mostly true of tablets, although the larger power budget means that the processors tend to skew towards higher core counts. The CPU cores are clocked at around 1GHz, and the more advanced ones feature out-of-order execution and modest superscalar issue, typically two to three RISC instructions per cycle at peak. Simpler cores for more power-constrained systems tend to be in-order and issue one to two instructions per cycle. This level of complexity is generally on-par with the CPU cores found in the early to mid 1990s.

Realistically, it is hard to see any benefits from quad-cores in mobile devices. The majority of PCs today sell with dual-core CPUs, and that is a reflection of the state of software; multithreading is hard and most applications are single threaded. Software for mobile devices is even more primitive and less amenable to threading. Comparing a quad-core to a dual-core at the same power, the dual-core should be able to reach about 25 percent higher frequencies (power scales roughly with frequency cubed). For the vast majority of workloads, a faster dual-core CPU will have better performance. Despite this fact, there appears to be some marketing value for quad-core SoCs, even if the delivered value is minimal.

One reflection of the divergence of smartphones and tablets is the graphics for these devices. Tablets have higher-performance graphics to drive the larger and higher-resolution displays and to make use of the greater power envelope. The actual GPU cores are usually the same, but with more cores and higher frequency for tablets. Looking at the iPhone 5 and iPad 4, the latter GPU has about 3X the shader throughput measured in FLOP/s (~100 vs. ~30 GFLOP/s). In terms of performance, the iPhone 5 is roughly the equivalent of a very low-end discrete DX10 GPU from 2007, while the iPad 4 resembles a mid-range model.

The other significant blocks in a mobile SoC are the wireless modem, which is often discrete for high-end phones and tablets (i.e., LTE devices), along with dedicated hardware for video encode/decode and image processing for the camera.

Power management ties together all these blocks and is particularly vital, since performance is limited by both the battery life and skin temperature (i.e., how hot the case gets). Simply put, there isn’t enough power or cooling for every block to be in a high-performance mode simultaneously. For example, when running a strenuous game, the display and GPU will draw much of the power; the CPU will actually have to reduce frequency and voltage to deliver the best overall performance. This becomes even more complex if there is significant wireless traffic as well.

SoCs of the future

Looking out 5-10 years, Moore’s Law means that transistors will be even cheaper. However, battery technology improves slowly and the maximum skin temperature is constant. Consequently, power will be even more of limiting factor in the future than it is today. So techniques that spend transistors (or area) to reduce power will be increasingly attractive.

While change is slow, eventually mobile developers will be able to take advantage of multiple cores. At this point, quad-cores can be more efficient by reducing frequency and voltage, as the PC industry has shown. Most workloads will still be single threaded and need high frequencies, so the SoC must be able to efficiently deliver both aggregate throughput and single-core performance. Eventually, almost all mobile SoCs will move to quad-core to handle the few cases of properly parallelized code.

The CPU cores will also become more sophisticated, improving single-core performance through both frequency and instructions per cycle (IPC). However, this evolution will be slow and steady because CPU performance is non-linearly expensive (in terms of both area and power) beyond a certain point. Many workloads simply cannot reach high IPC because of the nature of the code. One way that the industry has looked to get around this issue is with heterogeneous cores, which ARM bills as “big.LITTLE." This method pairs a small and efficient core with a larger and more complex core and switches between the two. The challenge again is power; these big cores can only be active one to five percent of the time, which limits the potential performance gains, and the switching penalty is an issue. Initially, there seems to be some interest in this solution, but it is unclear whether this will be a long-term solution for most vendors.

Graphics are an entirely different story because the workload is inherently data-parallel. While there are limits, desktop GPUs have shown that performance scales nicely up to at least 1-4 TFLOP/s if memory bandwidth increases commensurately (to roughly 200-250 GB/s). To a large extent, this performance will be used to deliver higher-quality graphics for 3D applications or better energy efficiency. Display resolutions may increase, but at a relatively slow pace considering today’s high-density displays and the slow rate of change for TVs and other external displays. Given the benefits of Moore’s Law, this means that GPUs will consume more and more die area, while keeping frequency and voltage relatively low to improve performance and energy efficiency. This is also one of the greatest motivators for any form of memory integration (whether in-package, 2.5D, or 3D), as there is simply no other way to provide enough bandwidth given the power constraints.

Image signal processors (ISPs) are also exquisitely parallel, just like GPUs, but the main driver is enhancing still and video images. Current cameras are strongly limited by the low quality and compact physical dimensions of optical lenses in mobile devices rather than the sensor resolution. In such a scenario, ISP performance will grow slowly, motivated by more sophisticated filters rather than higher resolution. However, an array camera could improve lens quality and motivate much more robust ISPs in the future.

The video-encoding and -decoding blocks are typically fixed function and will be upgraded to take advantage of the emerging High Efficiency Video Coding (HEVC) standard.

Over this time frame, the wireless landscape will be relatively stable. The industry is currently undergoing the transition to LTE, although the various 2G and 3G protocols will be crucial for backwards compatibility in areas with spotty coverage. LTE will certainly progress to higher speeds, but there is no replacement on the horizon for the next ten years or so. Some high-end phones, and most tablets, may continue to use discrete LTE modems for performance and flexibility, especially for vendors without internal wireless expertise. However, most smartphones will integrate the various modems into the SoC, reducing cost and power.

Of course, these guidelines are not absolute, and SoCs will vary to cover the full range of the market. Devices like the Kindle e-reader hardly need a lot of graphics performance, and budget devices may continue to use single or dual cores for many years.

Licensed CPUs

The most pervasive mobile IP company is unquestionably ARM. ARM is particularly well-known for licensing the eponymous instruction set (e.g., ARMv7 and v8), the Cortex cores (e.g., A7) that implement it, and other SoC components such as the AMBA interconnect. Nearly every company in the mobile ecosystem is an ARM customer in one fashion or another.

One big trend we mentioned earlier that impacts ARM is the shift toward vertically integrating IP. Today, ARM has a large number of customers that license the Cortex A-series for mobile devices, including Broadcom, Mediatek, Nvidia, Texas Instruments, and Samsung. In contrast, the larger SoC vendors such as Apple and Qualcomm prefer to license the instruction set and design their own CPU cores. The latter approach requires more engineering talent, but ultimately costs less in terms of royalties; essentially it is a trade-off between fixed and variable costs.

Long-term, companies with sufficient volume will shift from licensing CPU cores to licensing the ISA and designing the cores. ARM’s cores are by necessity somewhat generic, since they must be attractive to all customers and compatible at all the major foundries (TSMC, GlobalFoundries, UMC, and Samsung). In addition to cost advantages, a custom core can be carefully optimized for the target applications and the underlying manufacturing.

Another issue is the divergence between tablets and smartphones. It is very hard to design an optimal CPU core for radically different power limits, and at some point the tablet market may grow to be large enough to merit a more carefully optimized design. The sweet spot for tablet SoCs is around 2-6W, versus 0.5-1.5W for a smartphone. It may prove more efficient to have two different cores to spanning the full range from 0.5-6W rather than using a single design.

So in the future everyone will have a custom ARM core, except Intel, but the graphics will remain outsourced for most of them. Probably in the next months samsung will announce a new custom core for the SGSIV, the A15 is too power hungry for a phone, or they will ask qualcomm since they have already used qualcomm chipset in the past.

This article, like the first one, sounds like it is written by a consumer rather than from someone who understands semiconductors. More nonsense about "vertical integration" and another nonsense statement of "Long-term, companies with sufficient volume will shift from licensing CPU cores to licensing the ISA and designing the cores." This will not happen and standard ARM designs will prevail.

Then to top it off with trying to claim that the big daddy, Qualcomm, is the more vulnerable is laughable. Hint: neither Apple nor Samsung make true mobile SoCs, more mere Application Engines. Both are completely reliant on horizontal integration for actual telecom capability. When it comes down to it cost will decide everything and that means greater integration of everything (i.e. including telecom capabilities into a single piece of silicon). Graphics IP is much more simple to buy-in and integrate than telecom.

If I were to look at the future, I would see Mediatek and other upcoming Asian companies like Spreadtrum as having a bigger influence. Intel, while certainly becoming a big player, were also nowhere in the mobile SoC business before their Infineon acquisition, which is odd not to mention in their history. Broadcom and ST-Ericsson should also both have had a look-in.

I think it's important to note that Samsung is much more open to third party IP, to the point that they use NVIDIA, TI, and Qualcomm SoC besides their own. If Intel were to release a blockbuster mobile product I would expect Samsung to be one of the first to use it.

Quote:

Apple would need to develop or acquire a multi-mode modem (i.e., capable of 2G/3G/4G) and integrate it into the iPhone SoC.

Qualcomm is epxected to ship a multi-mode modem this year; while it won't be integrated into the iPhone SoC, it will be available for them to use.

The old wintel - Intel and MS WOS - is dyingIntel must learn how to make cheap SoCs and MS WOS perhaps should change its kernel to a nix one to achieve better performance at cheap computers as Apple did some time ago.

ARM64 and Ubuntu are not even considered at this great article.

Nix kernels has been proved far better than MS WOS ones as every techie knows, and of course far better at this not powerfull SoCs.

Chrome OS is shining, also preinstalled Ubuntu machines in Asia.

The ubuntu POCKET COMPUTER concept is even better than original iphone and Google and Apple will make their own aproaches, we will probably see pocket computers OSs from both brands.

ARM currently is no match for x86 computers. People may be buying them in droves, but seriously - try running a modern game, or even just a more complex productivity program, on ARM. They will have their place, but I have no expectation that every PC in the near future will be SoC. I own an Android tablet myself, but I'm not throwing away my PC any time soon.

As for Microsoft using a *nix kernel - I want whatever you've been smoking man! I'm not advocating any OS, but that statement just blows my mind. Microsoft has kept its position in the business world for its pretty well-regarded NT kernel, which AFAIK is now the same for server, business and private use Windows versions. On top of that, they already have the WinRT kernel running on ARM. Why would they suddenly drop all this and turn to *nix? What are the advantages? You speak of performance, but is that seriously even an issue any more? I haven't heard many people complain about Windows 7 hogging their systems... let alone Windows 8, but that's pretty fresh so I'm not even considering it fully.

The old wintel - Intel and MS WOS - is dyingIntel must learn how to make cheap SoCs and MS WOS perhaps should change its kernel to a nix one to achieve better performance at cheap computers as Apple did some time ago.

ARM64 and Ubuntu are not even considered at this great article.

Nix kernels has been proved far better than MS WOS ones as every techie knows, and of course far better at this not powerfull SoCs.

Chrome OS is shining, also preinstalled Ubuntu machines in Asia.

The ubuntu POCKET COMPUTER concept is even better than original iphone and Google and Apple will make their own aproaches, we will probably see pocket computers OSs from both brands.

For large machines, X86 is still the king. I have a 128GB iPad 4 and an OQO-02 for my mobile computing and an Ivy Bridge/2 GTX690s for home use. No matter how you cut it, the current crop of SOCs cannot come close to any high end desktop. (Not even the best laptop on the market will touch that desktop).

Of course Moore's Law will win in the end. I have personal experience there. My first "machine" was a Minivac 6010 in the mid 60s, I built an 8080 S-100 box (with an ASR-33 Teletype) in the 70s, 8086 in the early 80s, etc.

I expect intel to gain some traction as tablets start to transition from "big smartphones" to something more akin to laptops to justify their price premium. intel will get their X86 power envelopes down while maintaining superior performance to ARM. I recall reading that their JIT recompilers work extremely well, so handling ARM-centric code does not seem to be a problem on x86, in addition to the uncountable mountain of x86 code in the world.

Nice article, but I wondered a bit at, "Realistically, it is hard to see any benefits from quad-cores in mobile devices." I know it might seem like this at first glance given the typical one-app-at-a-time usage pattern, but... Are pricier quad-cores really getting design wins in tablets and even smart phones due to marketing reasons? Is that why Samsung is bringing out its 2 x 4 core design? I strongly suspect that even now these architectures are providing benefits for common usage such as web rendering. I realize this was a survey article, but it would have been nice to see a bit of substance behind the statement that quad-core is just a hose.

Then to top it off with trying to claim that the big daddy, Qualcomm, is the more vulnerable is laughable. Hint: neither Apple nor Samsung make true mobile SoCs,

Then who makes the Exynos SoC?

ARM does. Samsung is little more than a fab (read: silicon fabrication) for ARM's parts. They do a little side customization to the logic to facilitate implementation into their devices. This is why Samsung is used in the industry as the benchmark for the current ARM generation of products for performance (compute, power, etc.). Qualcomm and Apple are the only two that I know of that implement the ARM ISA independently. Benchmarks will show that either of these two company's cores outperform the Exynos on a per clock and (usually) per watt basis.

I will technically agree with paul5ra here though. Qualcomm provides a top-to-bottom in-house SoC design, CPU, GPU, modem, peripheral, audio, etc.. Whereas Apple designs their own custom ARM core and probably licenses peripheral technologies and obviously don't produce their own modem or GPU. And again, Samsung is more or less ARM's fab.

Realistically, it is hard to see any benefits from quad-cores in mobile devices. The majority of PCs today sell with dual-core CPUs, and that is a reflection of the state of software; multithreading is hard and most applications are single threaded. Software for mobile devices is even more primitive and less amenable to threading. Comparing a quad-core to a dual-core at the same power, the dual-core should be able to reach about 25 percent higher frequencies (power scales roughly with frequency cubed). For the vast majority of workloads, a faster dual-core CPU will have better performance. Despite this fact, there appears to be some marketing value for quad-core SoCs, even if the delivered value is minimal.

I fully and completely disagree with this section. While no one application may make efficient use of multiple cores, having additional cores is beneficial for system responsiveness in so many ways. If updates are being installed, that's a full core gone. If you've got a spreadsheet open that's crunching numbers, that's another core gone. Now your background apps have nothing to use for their tasks. (Nothing, of course, here referring to very constrained resources. The process scheduler will allow them to run at some point.)

I also believe that our modern programming languages are becoming more adept at multiprocessing, finally, and so we're seeing better use of multiple cores than we have in the past. Especially if I'm running Ubuntu Phone in a dock with the full desktop open, I'd rather have 4 fast cores, than 2 slightly faster cores.

I find myself wondering what impact the DX9/10 issue is for Windows Phone performance, and thus uptake in the market. The fact that Microsoft insists on using an API that is not supported in anything like it's modern form by any mobil GPU manufacturer seems like a real liability to the platform. And until there is a lot of uptake, likely no one will support it. The chicken/egg conundrum here seems like a bit of an anchor around the neck of Microsoft as they try to swim.

So in the future everyone will have a custom ARM core, except Intel, but the graphics will remain outsourced for most of them.

Not everyone - there's still AMD ;-)GlobalFoundries has been mentioned, but not AMD. They are No1/2 vendor in graphics. With x86 tablets they will certainly have a decent share of design wins for their "APUs".

I think the programming side of phones needs to get under control before we throw more power at it. Avoiding the "eventually it's good enough" argument, currently we still have rogue apps that get on phones and crank the cpu's up to full, draining batteries, heating phones, etc... w/o the user knowing or being able to do anything except pull the battery (or some other reboot method). The hardware in phones is already impressive ... I'd say overly impressive. It's the software-side that seems to be lacking currently... or rather "wild west" so to speak. Not sure I'd want a quad-core phone if it just ends up with an app on it that some how cranks up all the cores for no reason. It'd make an amazing warmer during winter while I'm riding my motorcycle ... omg! That's the new killer feature! Make an app that purposefully cranks up your phone to critical mass, so you can use it as a hand-warmer. (Let me guess...there's already an app for that.) I just get annoyed at how companies think throwing more hardware at the situation will miraculously solve some problem when the software seems to be the problem.

I'm curious, where is AMD's presence in this? Are they that bad off in the mobile space as to be written off? They seem to have a unique position where they have both a CPU and a GPU business that is relatively robust (though obviously hurting).

Having that combined expertise transition to a focus on mobile to compete with Haswell and beyond in the x86 space seems like a good way to rack up some wins for those that want good mobile GPU performance.

I just get annoyed at how companies think throwing more hardware at the situation will miraculously solve some problem when the software seems to be the problem.

Because that's the way things have worked in the desktop world for many many years, now. Optimization is seen as a bit of a waste of time when even the kludgiest code can run smoothly given a couple of years of faster hardware improvements.

Unfortunately, for mobile use, hardware isn't everything. You have to be power conscious, too, since battery life and thermal dissipation limits aren't evolving nearly as quickly as the number of transistors and cores are, so optimized, efficient programming has to come back into style now. Unfortunately, it hasn't been a universal adoption just yet.

I thoroughly enjoyed this and it is interesting trends. For interest sake here is an alternate future: suppose Qualcomm makes a breakthrough that reduces wireless latency to the point where cloud computing becomes big for mobile phones, then the market will pan out very differently. For anyone in developed country a purchase of such a handset will definitely look attractive (the data-center must near the cellphone, at least on the same continent, for this to be feasible, thus developing countries that cannot build data-centers for millions of people will initially be excluded), thus this will mostly create a new profit stream from developed countries.

On one hand Qualcomm position will definitely be secured. While Intel will benefit from its good data-center quality CPUs, they will also benefit from their transition to chips for third world countries as discussed in the article. I suppose that Nvidia will win from the GPUs in the data centers as well. Samsumg will still benefit from its growth in the developing countries but will benefit little from such a break through. Apple, well their main income source of high quality phones will disappear, but I suspect that they will survive such a transition.

I'm curious, where is AMD's presence in this? Are they that bad off in the mobile space as to be written off? They seem to have a unique position where they have both a CPU and a GPU business that is relatively robust (though obviously hurting).

Having that combined expertise transition to a focus on mobile to compete with Haswell and beyond in the x86 space seems like a good way to rack up some wins for those that want good mobile GPU performance.

But then, i'm ignorant of what AMD is doing in this space...

The short answer is that they probably have no immediate plans for entering this market.

The long answer is that before AMD acquired ATI like the article mentioned they had a mobile Radeon core that was prominently found in the original Motorola Razr. Although ATI had no real grasp of future of this market (and hence no real SoC development plans), they still had a foot in the proverbial mobile door. Then AMD acquired them. And AMD bled money from the acquisition for quite a while, Dirk Meyer was forced to contend with a failing organization. So he sold off the mobile GPU business to Qualcomm. This was the decision that got him, err... replaced as CEO of AMD later. Arguably though he saved the company from immediate death and brought it back into the black briefly but yea, talk about a 'do, you're damned, don't, you're damned too' situation.

So basically AMD at best has maybe some alpha designs or prototypes utilizing their own CPU/GPU technology, but its definitely not the focus of their development efforts right now given that they have recently captured the entire gaming console market and efforts in the server industry (SeaMicro). So if they have anything, its nowhere near primetime and they're not working on it heavily. And as such... probably have no immediate plans for entering the mobile market.

They don't really have one. The best low power chips AMD has produced have struggled to reach the tablet power window, and they've got nothing at all in the phone space(>2W). Their newest chips are expected to be better fits for the low power market(>6W), but I'm not holding my breath. Combine these with the fact that AMD has no capabilities in the wireless space, and they're simply not that relevant.

I think that if AMD delivers with Jaguar(their new low power core, expected Q2 '13), they will be worth looking at for tablets, but that's about the best that can be said.

Realistically, it is hard to see any benefits from quad-cores in mobile devices. The majority of PCs today sell with dual-core CPUs, and that is a reflection of the state of software; multithreading is hard and most applications are single threaded. Software for mobile devices is even more primitive and less amenable to threading. Comparing a quad-core to a dual-core at the same power, the dual-core should be able to reach about 25 percent higher frequencies (power scales roughly with frequency cubed). For the vast majority of workloads, a faster dual-core CPU will have better performance. Despite this fact, there appears to be some marketing value for quad-core SoCs, even if the delivered value is minimal.

I fully and completely disagree with this section. While no one application may make efficient use of multiple cores, having additional cores is beneficial for system responsiveness in so many ways. If updates are being installed, that's a full core gone. If you've got a spreadsheet open that's crunching numbers, that's another core gone. Now your background apps have nothing to use for their tasks. (Nothing, of course, here referring to very constrained resources. The process scheduler will allow them to run at some point.)

I also believe that our modern programming languages are becoming more adept at multiprocessing, finally, and so we're seeing better use of multiple cores than we have in the past. Especially if I'm running Ubuntu Phone in a dock with the full desktop open, I'd rather have 4 fast cores, than 2 slightly faster cores.

Your post kind of makes the author's point for him. Sure, there are situations as you describe where many cores can be put to use, but those situations are contrived and rare. How often do you really have a spreadsheet using 100% of a core while and update uses 100% and you are doing something else? My Mac has 4 cores, 8 threads, and there are very few times when they are all saturated for any length of time. Even when running other apps in the background, usually they finish what they were doing long before I switch back to them and just sit there not doing anything. I'd venture to say that having one core that is twice as fast would produce a more responsive system that one with four half speed cores 99% of the time or more.

For discret GPU with unlimited power (as in pluged into power greed), GPU drivers in deed are hardest part to make. Low end GPU's lack memory bandwith (and require black magic to compress data needed to be send), while high end have plenty of bandwith but also pleny of capable cores that need to be kept working all the time (thus require lots of black magic to order execution queue and data transfer so all cores work).

Add to it multiple platforms (Win, OSX, Lin - all are profitable enough to support them).

Add to it multiple API's (OpenGL 2.1/3.x/4.x DX9/10/11, OpenGL ES 2.x/3.x), which can not be ignored or ommited.

What you get is big and complex software that need to be written. For study case look at Intel GPU drivers on Windows. They are DX11 capable and only OpenGL 4.0 (and that after 1y of work by big team). While there is no thechnical reason why Intel GPU couldn't support OpenGL 4.2/3. Intel do have big enough team. What it lack is time, required for creating that complex software.

On the other hand mobile GPU drivers are much simplier:

Bandwith is constrain as well as power consumption (hece you optimize on almost one direction).

Only Android compability matter (Apple stick to only one supplier!).

API can be limited to OpenGL ES 2.0 (and 3.0 soon to be release but its backward compatible, so you can just present OpenGL ES 3.0 to any app needing 2.0). Which is MUCH cleaner and simplier than any "desktop" API.

The only thing that make things ugly is future. Because with time GPU's will increase complexity, and requirements for mobile GPU dirvers will increas, all mobile GPU vendors do not want to release any documentation for those GPU's, nor want they develope opensource drivers.

With that cost and complexity of developing mobile GPU drivers would difusse.

Good thing Intel will develop Open Source drivers for their mobile GPU's (which are same as for "desktop" Linux in that case).

I think Apple will definitely be integrating wireless modems into their SoCs. Maybe not this year, but it's coming. They'll probably license a design from Qualcomm first, but a I wouldn't be surprised to see a home grown solution.

Same thing with Intel. They'll integrate a wireless modem into their smartphone/tablet SoCs eventually. They are moving at such a snail's pace though.

Lastly, out of Qualcomm, Nvidia and Samsung, only 2 will stay in ARM SoC business. TI left the field already. Another will follow. It's not going to be Qualcomm, so it's either going to be Nvidia or Samsung. Other minor players can join the fray (Broadcom, Marvell, et al) but they will be in niches.

There's a big question mark in the room regarding x86 vs ARM. If Intel gets off its butt and ships a competitive SoC with their most advanced node, it'll get interesting. I understand their reluctance as ASPs on their x86 chips are and order of magnitude larger than ARM chips, but either they cannibalize themselves or someone else will.

"Realistically, it is hard to see any benefits from quad-cores in mobile devices. The majority of PCs today sell with dual-core CPUs, and that is a reflection of the state of software; multithreading is hard and most applications are single threaded."

It's important to note that this is not just David's opinion.Work done in 2011 looked at the state of threading across a wide variety of desktop apps and showed that, while they were aggressively threaded, very few threads actually ran at the same time. Basically the state of the art in 2011 on the desktop was that there was minor benefit to 3 CPUs available, and substantial benefit to 2 CPUs. The only exception was what you would expect: video encoding. In particular, both image processing and games still make little use of aggressive SIMULTANEOUS threading. (That is, they have plenty of threads, but most of those threads are sitting around waiting, handling things like async IO.)

For example, two of the obvious sloths on current devices (mobile and, to a lesser extent desktop) are PDF viewing and the browser, and while ideas about how to parallelize these have been floating around for years, they're yet to become real. In particular the most obvious way to parallelize web sites, running different tabs/windows in different processes, is of even less interest and value on mobile, with its small screen, than on the desktop.

What does this suggest? I suspect we may see a bifurcation between vendors driven by specs (Octacore...) and vendors driven by actual use cases (obviously Apple, maybe Qualcomm). Detailed examination of real mobile apps, for example, shows that they are currently most throttled by small I-cache, small I-TLB, and small branch-prediction storage. It would make far more sense (even though it may not be as sexy) to devote a whole lot of extra space to those rather than to an extra cache. Second level cache versions of all of these are an obvious idea, but it may even be worth increasing cycle time if that's what's necessary to allow for larger such caches. (Again, lower MHz, not as sexy, but faster on real apps.) (It's interesting to note that right now D-cache size and D-TLB misses are much less of an issue.)

A second way to use silicon would be to beef up NEON by making the registers and units wider. Obviously a lot of effort is going into auto-vectorization on the Intel side, and the first fruits of this are starting to appear in LLVM. Most of the realistic auto-parallelization possibilities (eg polyhedral optimization) are as applicable to vectors as to multiple CPUs, with less overhead and less power usage. In particular, if NEON+ or whatever they call it could pick up scatter-gather capabilities you'd have something really awesome there.

All of which suggests to me that for Apple the near future will not be quad core. It may include a third low-power companion core (the value of such cores is that you can leave the phone "alive" for much longer, even though it's not doing very much. Think of eg those apps that monitor your sleep patterns so they wake you up when you're already in light sleep --- they work well, but drain battery, even though they require minimal real CPU.) It might also (perhaps it already does? How much do we know about the interior guts of Swift?) look rather more like a server CPU --- large I-cache, large I-TLB, lots of branch-prediction, and maybe (because it's easy and cheap in power and logic) SMT. If I were an Apple (or a Qualcomm, or an ARM) designer I'd be pushing, in order of priority(a) the large I support structures I mentioned(b) better NEON (scatter-gather as highest priority, wider as next priority)(c) SMT with more cores (non-companion) as a far future possibility.

Your post kind of makes the author's point for him. Sure, there are situations as you describe where many cores can be put to use, but those situations are contrived and rare. How often do you really have a spreadsheet using 100% of a core while and update uses 100% and you are doing something else? My Mac has 4 cores, 8 threads, and there are very few times when they are all saturated for any length of time. Even when running other apps in the background, usually they finish what they were doing long before I switch back to them and just sit there not doing anything. I'd venture to say that having one core that is twice as fast would produce a more responsive system that one with four half speed cores 99% of the time or more.

Agree with you. If it wasn't for Intel's "turbo" modes in their multi-core chips, we'd all be feeling a bit different about the performance of our computers today compared to 5 years ago.

...rant omitted...Broadcom and ST-Ericsson should also both have had a look-in.

The tone is rude, but his point re ST-Ericsson is correct. ST-Ericsson has some REALLY nice fab technology coming up, and it will be most interesting to see how they package it and the extent to which they either license it or sell SoCs.