ARM attacks Atom with 2GHz A9; can servers be far behind?

ARM took a major step this week in bringing the netbook fight back to Intel by …

Intel hasn't been shy about its plans to challenge ARM in the low-power embedded space, and the world's largest chipmaker is gearing up for the debut of the 32nm process that will enable it to reach new levels of x86 power efficiency. But ARM isn't sitting still, and the British IP company took a major step last week in bringing the fight back to Intel by boosting its Cortex A9 processor up into Atom territory. One of the engineers behind Amazon Web Services is even eyeing the part as potential datacenter web server material.

On Wednesday, ARM announced the availability of IP for a 2GHz dual-core A9 processor on TSMC's 40nm process, which the company claims will offer massively more performance than Atom within a smaller power envelope. You'll recall that Cortex A9 is an out-of-order processor, so, unlike the in-order Atom, the A9 should have much better performance per clock on standard integer code. So while some of ARM's claims about the performance delta between its 2GHz A9 and Atom may be overblown, the part at 40nm should therefore be more than competitive with a 32nm Atom in performance per watt.

Unfortunately for ARM's netbook ambitions, Linux is the only netbook OS that matters that runs on ARM, and the jury's still out on whether it can really take on Windows 7. As for Windows on ARM, it just ain't gonna happen, ever. (Same for Mac OS X on ARM... and please read that previous link before writing in to inform me that the ARM-based iPhone runs Mac OS X, or that WinMo runs on ARM.) Given that even the most insanely power-efficient, Atom-smashing 2GHz ARM netbook product is going to be relegated to whatever netbook niche Linux can carve out for it, it's worth asking what sort of future there is for such a high-powered ARM part.

One idea would be servers, believe it or not.

An ARM-based web server

The idea of building cheap but capable web servers from ARM parts has been enthusiastically floated by James Hamilton, a Vice President and Distinguished Engineer on the Amazon Web Services team. In a post earlier this month, Hamilton enthused about the idea of using a multicore, cache-coherent ARM SoC to do low-cost, power-efficient web hosting.

"The ARM is a clear win on work done per dollar and work done per joule for some workloads," Hamilton concludes. "If a 4-core, cache coherent version was available with a reasonable memory controller, we would have a very nice server processor with record breaking power consumption numbers."

Clearly, Wednesday's 2GHz A9 announcement was right up his alley, so he followed up with a post on the part suggesting that it may be what he's looking for. The post features a nice benchmark graph that he got from ARM, showing the 2GHz A9 doubling the performance of a 1.6GHz Atom N270 at EEMBC Coremark.

Those numbers are impressive, but before we talk performance, let's talk price.

Physicalization and Intel's margins

What Hamilton is essentially endorsing is "physicalization," an approach to server design that packs multiple, cheap, low-power systems into a single rack space. The name is a play on "virtualization," because instead of having one large, expensive system running multiple virtual machines, you use a fistful of small, cheap physical machines. The end effect of both is multiple OS instances packed into one rack space.

If you're thinking that physicalization is an odd use of Moore's Law, you're right. The only thing that makes the technique feasible from a performance per dollar perspective is the fact that Intel charges a fat premium for its higher-end server chips. Avoiding that premium is the sole reason that anyone would even consider using board-level integration (i.e., multiple chips and physicalization) instead of die-level integration (i.e., one Xeon and multiple VMs).

But given Intel's markup, and given a robust ARM ecosystem that keeps ARM prices relatively low, physicalization with something like a 2GHz A9 could well deliver more Linux OS instances per dollar than a regular Xeon-based server.

For performance and performance/watt, there's more than just the core

If we grant Hamilton that ARM may turn out to be a cheaper way to pack OS instances of acceptable performance into a datacenter (at least until Intel lowers prices in response), it still doesn't necessarily follow (contra comments I see at Hamilton's blog and elsewhere) that ARM could just pack four or eight A9 cores onto a die, crank up the clockspeed, and slay Intel's Nehalem or AMD's Shanghai in performance/watt at a given absolute performance level. This is because performance/watt numbers are much higher for low-performance processors than they are for high-performance processors (GPUs excepted).

David Kanter at RealWorldTech has recently posted a great article comparing a number of CPUs and GPUs in performance/watt and performance/mm2 (die area). Atom is literally off the charts in performance/watt, besting Nehalem by some 3X. This isn't because Intel sprinkles magical performance/watt pixie dust on Atom—it's because high performance/watt ratios for individual chips are much easier to achieve at Atom-scale than at Xeon-scale, owing to the much larger amount of system-related complexity and overhead that goes with the Xeon's much higher level of integration and performance. As is the case with everything from dinosaurs to automobiles, it just costs more to be bigger and badder, and one of those costs is net energy efficiency.

It's also the case that for raw performance, interconnects and system architecture issues matter a great deal, and they matter more the more cores and other types of resources (like high-bandwidth I/O interfaces) you try to cram onto one die. The minute you put four or eight of any kind of core onto a single die and try to wire it all together with the best cache hierarchy and the optimal mix of I/O and memory bandwidth, then all of the sudden you're trying to solve a much harder problem than you are with a simple dual- or quad-core embedded chip. You're also playing a high-stakes game where one or design mistakes could blow the whole configuration, and you're playing it on Intel's and AMD's home turf.

In the end, the era of cache-coherent multicore is fundamentally different than the single-core era that preceded it, because in that earlier, simpler era core-specific factors like microarchitecture and clockspeed were all that mattered. But nowadays, system design and microarchitecture relate to midrange and high-end multicore processor performance somewhat like oxygen and fuel relate to a flame's heat output—you need both of these elements tuned to give the desired result.

My ultimate point is that any four-core ARM desktop or server processor that shoots at a similar absolute performance target as a four-core Nehalem processor will either look pretty much like a four-core Nehalem, or it won't hit the target. It will also have relatively similar performance/watt characteristics, and will end up competing with Intel and AMD on fab muscle.

40 Reader Comments

Still haven't considered the possibility that Apple acquired P.A. Semi and might be planning on using that team to develop some sort of hybrid Mac OS/iPhone OS on its own take of a netbook, the tablet. I'm going to go out on a limb here and say that IPhone OS is a scaled down version of Mac OS X, which is why iPhones can run Unix code just fine. So scaling up to transition to ARM is probably somewhat of a challenge, but no different than transitioning from PowerPC to Intel... and they pulled that off just fine. Why Windows doesn't have to do the same trick? Microsoft doesn't need to, it's already selling tons of licenses either way. ARM is only for those companies who really care about squeezing every watt from the CPU... and Microsoft isn't really in a position to benefit financially from that IMHO, besides in Zunes and... Windows Mobile, which is losing partners left, right, and center. I'd say that Microsoft is probably just as dependent on Intel as Intel is dependent on Windows.

And when that "upcoming netbook OS" ever ships (and then if it ever leaves beta), we can talk about it. As of right now, it's nothing more than vaporware. And given google's tendency to get bored with things...I wouldn't hold my breath, if I were you.

And even if it does come out sometime soon, expect an android style experience, i.e. no real native code, drastically underpowered hardware, and google saying "it'll get better soon!"

The Real reason behind why Server has a high chance of Success is Linux. Linux has a VERY high Web Server Market Share, it works on ARM. And it is Open Source, unlike Windows you cant do much on the Software Stack. Linux Ecosystem put the software in your own hand. Which is good for company like Amazon and Google.

While i am not an Expert, i believe that ARM is actually well suited for Simple Web Server Market. Where you need High Speed and HUGE amount of threads. It definitely wont work for Database Server. But for Storage, and front end for Serving Webpages it might very well does an good Enough Job. While being cheaper and uses less power.

I think where you and Hamilton diverge is the level of scale you are considering. Your critique is largely at the level of a single box when you note that a multi-socket, multi-core system brings with it a lot of overhead that reduces performance per-joule.

Hamilton is worried about "Internet scale" infrastructures. Since there is no way that any of the workloads he worries about will be running on a single box, or even a single rack of systems any time soon, he isn't so much concerned with the microptimizations required to cram four 8-core hyperthreaded processors in a single box.

Instead he recognizes that even a subsystem is likely to be running over tens or hundreds of boxes, and that the entire application may be running over hundreds or thousands of boxes in multiple data-centers. His challenge is how to do that as inexpensively as possible.

This drives the high-level requirements:

Huge numbers of machine instances must be easily managed by a small operations team, so the number of threads independent threads a single box executes at once is not of primary importance.

Applications must be able to run effectively over an infrastructure of thousands of machine instances spread over multiple racks and multiple data centers. A unified address space for a few dozen simultaneously executing threads is not a big win, long synchronization latencies are to be expected. Partition-tolerance is a must and so cache coherency across dozens of simultaneously executing threads is optional

The price/performance that matters is measured at the datacenter level. Since power-consumption drives two of the largest costs of data-center construction and operation (power & cooling), performance-per-joule (or watt) is a preeminent metric.

In the end, the heroic ends that AMD and Intel have gone to to improve single-system performance don't matter as much to people like Hamilton. That heroic engineering has costs that are amortized over a much smaller volume than components destined for consumer systems and devices, and, as you note, it results in systems with much less-favorable performance/joule ratios to boot.

What is the upside of systems with both less favorable price/performance and performance/watt ratios than systems built with cheaper, less exotic parts if you have an application architecture and a system management infrastructure that works whether there are 10,000 systems, or 80,000?

To this point, while ARM may present attractive performance/joule ratios, the overall price/performance hasn't been as attractive because while the chips are cheap, the overall system cost wasn't that much better because of the cost of the other components. Now that quad-core ARM cores running at high clock rates are within sight, the overall price/performance for a system built around them is looking more appealing. They don't have to shoot for Nehalem-levels of single-system performance, indeed, doing so would thrash their obvious competitive advantages in the emerging market for components for building Internet-scale infrastructures.

My ultimate point is that any four-core ARM desktop or server processor that shoots at a similar absolute performance target as a four-core Nehalem processor will either look pretty much like a four-core Nehalem, or it won't hit the target. It will also have relatively similar performance/watt characteristics, and will end up competing with Intel and AMD on fab muscle.

For a company like Amazon or Google or whatever, I'm not sure they're after more OS instances, at least not for everything.

For large parts of their infrastructure they probably just want however many zillions of searches per second or recommendation engine requests per second or whatever. For that stuff they don't care about OS instances, they just want the cluster in aggregate to meet performance requirements at low cost, and the machines in the cluster will be running near identical OS instances.

You wouldn't have an ARM competitive with Nehalem without having an ARM that looked like Nehalem, but for sites that can't run on a single server like Amazon, the appeal to them is probably getting the throughput they want with a cheaper cluster.

For anyone not dealing with a site under that kind of load that requires that kind of scalability, the appeal of ARM servers in the datacenter would be reduced. For Amazon, they're running one application on many servers, so they stand to benefit. For, say, me, I'm running many applications on few servers, so I'm going to get a few x86 servers and virtualize.

Edit:

eas, excellent response and you beat me to it as well

I can imagine ARM server chips being useful for other stuff where you're not necessarily in a cluster, for example an appliance style server that comes loaded with the vendor's software, they'd be able to cut costs. But this is clearly too dependent on the tradeoffs of any given situation to be any sort of general challenge to x86 on the server.

Apart from that... string acceleration instructions? Crypto could help too.

quote:

Originally posted by jzkgeq:What about the new ULV chips that Intel is apparently on the verge of releasing? (http://arstechnica.com/gadgets/news/2009/09/unlaunched-atom-smasher-shows-up-in-samsung-x-series-laptops.ars)

I imagine this will plug the hole that ARM is trying to drive though...

Atom could probably plug it on netbooks if it weren't so limited and its chipsets weren't so poor.

On high-density servers, I seriously doubt Intel's willing to compete on price with ARM.

Umm... I wouldn't think that a server centric version of an ARM chip would become that similar to Nehalem. Creative's Zii chip is a dual core ARM based solution with 24 additional processing elements with 4 interconnects that can be packed tightly. Notice the cooling system, or lack there of, in the video.

Now the key point about Zii's interconnects: are they cache coherent? That I'm not sure of. I'm also not sure about the interconnect bandwidth. Those are very important questions but I do see promise in this direction.

My ultimate point is that any four-core ARM desktop or server processor that shoots at a similar absolute performance target as a four-core Nehalem processor will either look pretty much like a four-core Nehalem, or it won't hit the target. It will also have relatively similar performance/watt characteristics, and will end up competing with Intel and AMD on fab muscle.

Dear Jon, you know better than any oneone else on this board that this theory is not necessarily true. IA-32 architectures have too many deficiencies to fight against any other ISA crafted to go against it.

As an example, Intel touts its micro-ops fusion as one of the best revolutions, it also reminds us that IA-32 architectures require some extra pipeline stages just to decode, rearrange and dispatch its instructions, while a much simplier ISA like the ARM v7 does not suffer from this penalty (which also killed the PowerPC 970, as you remember). Add to this the fact that IA-32 has evolved to 64 bit (which, is slower than 32 bit, whatever Intel Marketing says), has asymmetric register allocations, Instruction prefixes to analyze, registers to save for a context switch ... well, you got the point.

The truth, from my point of view, is that ARM will not compete with IA-32 in the near future. It will not compete because ARM Holding is not focused on the same market as Intel or AMD and will not begin a war it cannot win without a software ecosystem as mature as its hardware counterpart.

I'd keep in mind that while IA-32 is mainly a triple {Intel, AMD, VIA}, ARM can count on 200 implementors. Intel is no doubtly an 800 pound gorilla, but even so, I would not wake the sleeping giant. No one is going to steal the rival market by the storm, but ARM seems more likely to succed.

sad that it's not support by win or mac . hate to say that this cpu won't go anywhere . even if it can save some power but it won't be able to stay at competitive price . it will be hard to fight in server level

I could see these ARM units doing some things very efficiently in large farms, like handling a few users per core, with smallish memory requirements, even running on flash local storage for speed and low power, and calling out to heavy hitters for big jobs like large DB queries. Multiply by many cores per chip and many chips per board, backed up with built in network switch and storage interface, and you could have a real killer performance box for running a bajillion simple threads with good enough performance to keep thousands of users happily fed web pages and such.

Also note that they can run JIT produced bytecode natively at blazing speeds, so server side Java and other JIT compiled languages might gain a great leg up in OPS/W for this kind of per-user centric workload.

I also see a huge advantage with many small cores physicalization over virtualization for some workloads. By eliminating the overhead of many whole-OS context switches, there HAS to be some big power savings. If each chip can run it's own compact OS+apps payload, with critical code and current data cached and ready, and a reasonably small simple set of jobs to do, the only balancing act becomes making sure each core is optimally loaded or asleep. You get to skip many layers of big system complexity, as long as all the jobs fit well on small systems. Also remember: at .25W per core, you could easily be talking about HUNDREDS of cores per rack unit. That's a box that could get a lot of small simple similar jobs done at once.

Originally posted by anthonyr:64-bit would probably help. Even for tiny little ARM servers, 4 gb might be a little light.

The 2GHz A9 is provided as a hard-IP so extending the instruction set and architecture wouldn't really be possible.

For the applications that Amazon would look at though, large and unified memory really wouldn't be necessary. The idea behind this is to essentially distributed-computing-in-a-rack where each dual or quad ARM core is paired with their own pocket of memory running an independent instance of server requests.

The bottleneck would be whatever central database that it accesses/writes once the transaction completes.

quote:

Originally posted by jzkgeq:What about the new ULV chips that Intel is apparently on the verge of releasing? (http://arstechnica.com/gadgets/news/2009/09/unlaunched-atom-smasher-shows-up-in-samsung-x-series-laptops.ars)

I imagine this will plug the hole that ARM is trying to drive though...

The thing about ARM is that it's open. One person mentioned crypto acceleration which would fit perfectly within an ARM SoC. There are many chip-makers with ARM licenses (and can buy one) and they each have their specialties. Some are packet processing specialists, some are crypto specialists, some are graphics specialists, etc.

Each one can buy an ARM license and stick a few of these in a chip with a fast crypto/graphics/network subsystem along with everything from a PCI-E interface to a 10GBE interface.

So all of a sudden, you don't just have Atom vs A9, you have Atom + chipset + ethernet adapter + accelerator chips vs a single chip.

Originally posted by imgod2u:For the applications that Amazon would look at though, large and unified memory really wouldn't be necessary.

That's true, I'm just thinking that it may not be optimal to be limited to 4 gb on a quad-core server, even if they're ARM cores. That's a pretty big limitation to get stuck with if there's no migration path in the works.

Yeah, some Texas Instrument's OMAP series include a DSP/Video processor that allows for 720p output. Examples of this hardware (that have been out for a while now) are the Beagleboard. Hopefully a new version of the OMAP and Beagleboard comes out with this newer CPU, the same or better video output, A SO-DIMM SLOT, USB and a gigabit ethernet adaptor.

I would love to see a literal cube that has all the beagleboard has to offer as a SoC, plus several GiB of RAM. I can imagine some sort of power/ethernet/USB/etc plane the "cubes" could plug into, with those planes being stackable or on drawers, with a nice soft LED showing if the cube is working, with the ability to simply remove it from the plane if it's not. Hell, each cube could probably have it's own backup battery.

Sure, ARM doesn't have support for 64-bit as of yet, but I would expect them to eventually come out with one, however, I would expect it to support 64-bit only, to save on die space and/or power, the very things that make ARM so famous.

Also, two things I would love to see paired up with an ARM SoC are an AES and zlib coprocessor.

I wouldn't throw out the idea of Linux/ARM netbooks just yet.The iPhone went from zero users and zero developers to top smartphone in how short of a time?People really are becoming more and more internet based the internet really is OS neutral.For servers yes I can see a huge market for ARM based servers. My office has a few old server boxes running old P3s. We have looked at using Xen and putting them all on one or two boxes but the effect of a failure makes us nervous.Small cheap cool ARM boxes would work just fine for most of those boxes.

It is super efficient, and runs 24 hours a day with Apache2, mysql, etc. Mine is set up as a home network samba hub for data storage, media server, and webserver.

This generation of ARM is just the beginning. Running debian linux on it is great. So the future: why not racks full of ARM chips saving millions in electricity, helping eliminate global warming.Sanjaya Yogi

As Jon correctly notes, virtualization is not a performance measure, it's a way to dodge the Intel Tax or, more relevantly, the x86 tax (given that AMD aren't able to compete at the price/performance/watt levels of A9 either).

As Seymour Cray was commonly heard to say, "You can't fake hardware that isn't there" and this is entirely what virtualization is, faking hardware that plain isn't there. You can't tell me that stacking executive schedulers on top of each other is efficient design. Wasting CPU cycles just because we can fit a ton of RAM isn't solving a problem, it's becoming one, needing more and more cores so needing to pay more and more x86 tax.

Like many other things before it (RDRAM springs to mind), virtualization is becoming part of the problem it was trying to solve.

I don't quite understand why you think the iPhone/ARM port doesn't count as OS/X running on an ARM processor.

It isn't OS/X server - but that isn't what you'd need on a NetBook.

It's enough of OS/X to support the major requirements of a mobile phone/gaming device. Apple is certainly capable of growing that to support the major features of a netbook/tablet device.

The major issue would be with app developers having to target their applications to the platform. But they would have to do that anyway if the "touch" interface would play any kind of role. It wouldn't be MS Office or Photoshop in your hand, but does that make it less of a computer, or not count as running OS/X?

I would agree with your assertion, if what you meant was that I wouldn't be able to by an Apple Netbook, install my existing MS Office on it and have it work.

where there is also considerable discussion wrt Cortex A9, etc. Some important points here, wrt the considerations here:

* Jon and this article is reacting to an announcement that TSMC will be vending A9 on a 40nm bulk-silicon process. Folks, this is the "day late" LOW performance process for A9. Back in february the Common Platform Alliance demonstrated working silicon for a Quad A9 fabbed in 32nm SOI. Anthing that you would remotely consider using to compete in server markets will come from CP vendors (including Global Foundries, which just bought Chartered, the CP vendor at the time of that demo)

* With all due respect I think that Jon is really missing the boat on this one:

quote:

This isn't because Intel sprinkles magical performance/watt pixie dust on Atom—it's because high performance/watt ratios for individual chips are much easier to achieve at Atom-scale than at Xeon-scale, owing to the much larger amount of system-related complexity and overhead that goes with the Xeon's much higher level of integration and performance. As is the case with everything from dinosaurs to automobiles, it just costs more to be bigger and badder, and one of those costs is net energy efficiency.

First of all ... whatever 'pixie dust' Intel sprinkles on Atom ... appears to be a negative impact. Secondly talking about "Atom scale" with "Xeon scale" is just not talking about the underlying issues which matter, and of course is across wildly different CPU microarches. (Core vs Atom.) The energy cost differences between Atom and "Xeon" (Core i7 now) are significantly affected by the energy cost per core ... duh! There should be no surprise at all that the Core microarch pays a big energy price to get the high single-thread IPC performance these chips can deliver.

The obvious comparison which is inescapable is Core A9 vs Atom ... with the metrics being performance/$ (which tends to be performance/mm2 ... at least when equal process is assumed) and performance/W.

All the indications are right now that A9 slays Atom on these metrics, and that's why Google and a lot of other folks are interested.

One can ask why the A9 slays Atom ... and that has several answers (IMO possibly including that Atom is not really a very good design) ... but among them is simply that when you are building a very small core, intendeing to go widely multicore ... the x86 decode costs (in transistors and watts) start to matter a lot.

Presuming that the ARM/CommonPlatform demo last february wasn't a fake, the ball is in Intel's court to build and vend a more competitive core intended for wide-MP (many cores) than Atom, as far as server kit.

My ultimate point is that any four-core ARM desktop or server processor that shoots at a similar absolute performance target as a four-core Nehalem processor will either look pretty much like a four-core Nehalem, or it won't hit the target. It will also have relatively similar performance/watt characteristics, and will end up competing with Intel and AMD on fab muscle.

Dear Jon, you know better than any oneone else on this board that this theory is not necessarily true. IA-32 architectures have too many deficiencies to fight against any other ISA crafted to go against it.

I once thought this way too... about twenty years ago. The problem with being an engineer is you thinkbetter engineering is the key reason behind every product's success or failure. The reality is thateven though RISC is fundamentally better it isn't better by anywhere near enough to overcome intrinsicdeep-seated economic and business factors that vastly favor x86. That is just the hardware challenge.On the software side the network effect is even more formidable. That is a very sour lesson for anyidealist technophile to swallow but it is a necessary one to view the world in a realistic fashion.

For starters, the playing field isn't anywhere close to level. Between Intel and AMD x86 sucks in nineout of every ten dollars spent on processors world wide. In the field of general purpose computing (i.e.non-embedded) x86 processors are manufactured at unit scales of hundreds of millions while RISCs likePower and SPARC are manufactured at the scale of hundreds of thousands. Here in the real world x86gets access to the most advanced process technologies ahead of RISC and processor development teamsare far bigger and better resourced than any RISC design team including IBM's.

IMO Jon is entirely correct in his commentary above. In any given week Intel likely SPENDS more on eachof a handful of x86 development projects than ARM brings in as TOTAL REVENUE from licensing and royalties.The cleverness *and* brute force custom IC engineering that buys far outweighs x86's inherent disadvantages.

Over the last decade and a half ARM has pulled ahead of i960, 68k, MIPS, and Hitachi and has achieveda comfortable and relatively lucrative presence in the embedded control market. It has done this largelyby staying away from Intel as much as possible in both its business model and the product segments itaddresses. History is full of processor companies that charged directly at Intel and failed (basically allbut one and the financial clock is quickly running out on that one). It would be a shame for ARM to makethe same mistake.

Dear Jon, you know better than any oneone else on this board that this theory is not necessarily true. IA-32 architectures have too many deficiencies to fight against any other ISA crafted to go against it.

I once thought this way too... about twenty years ago. The problem with being an engineer is you think better engineering is the key reason behind every product's success or failure. The reality is that even though RISC is fundamentally better it isn't better by anywhere near enough to overcome intrinsic deep-seated economic and business factors that vastly favor x86. That is just the hardware challenge. On the software side the network effect is even more formidable.

Normally true except in the smartphone market where there is no Intel network effect to tap into. As soon as Intel can hit the power envelope of a smartphone then we can talk, but we still have several years for ARM to continue to become entrenched in this space. The risk is that almost everyone who plays in this space (Nokia, Microsoft, Apple) already has a crossplatform x86 strategy they can tap the instant Intel is ready to play.

quote:

Here in the real world x86 gets access to the most advanced process technologies ahead of RISC and processor development teams are far bigger and better resourced than any RISC design team including IBM's.

AMD's spinoff of their foundries might be a blessing for ARM customers, especially if one of them has the resources to architect a high performance and low power ARM chip.

quote:

Over the last decade and a half ARM has pulled ahead of i960, 68k, MIPS, and Hitachi and has achieved a comfortable and relatively lucrative presence in the embedded control market. It has done this largely by staying away from Intel as much as possible in both its business model and the product segments it addresses. History is full of processor companies that charged directly at Intel and failed (basically all but one and the financial clocking is quickly running out on that one). It would be a shame for ARM to make the same mistake.

I think ARM needs to investigate higher performance parts if only because the iPhone and ilk in six years will require it. Either ARM will be there waiting or Intel. ARM just can't afford to wait.

I'm more interested in how this compares to Pineview than current offerings from Intel. It's all good that the new ARM processor can beat the Atom, but the Atom is on the market, it has been for a while now, and it hasn't gone through many changes. Pineview is set for release at CES and is to be available "shortly" after that, so we're looking at around 4 months. Will there even be a A9 netbook on the market in that time frame?

Originally posted by robrob:Will there even be a A9 netbook on the market in that time frame?

If by "netbook" you mean "runs Windows" I can guarantee you the answer is NO.

The word "smartbooks" has been coined for non-Windows small lappies and a bunch of vendors appear to be lined up to start selling them, I won't speak to release dates.

Also, FWIW, Apple is rumored to have a "Tablet" in the works which will be RSN, and this tablet is rumored to be powered by the ARM SoC the PAsemi team is known to be developing.

ARM and Intel are both spinning furiously ... FUD and spin and rumor are flying everywhere. This itself can usually be trusted as a sign that something is going to happen in the market ... the worst thing IMO which could happen is that A9 arrives to products late and slow, and Pineview ends up being underwhelming ... at which point a pox on all their bodies!

Only an incredible optimist would argue that the early "smartbooks" will be polished products, free of significant early "teething" problems. Software will also be a big issue to achieve significant consumer uptake.

The key question IMO is how deep are the pockets of those committed to making a run at Intel in this evolving market; how stable is the alliance working on this?

I think that all comparisons to AIM vs x86 are effectively irrelevant however. That was an attempt to take on x86 in all of its core markets, with a processor lineage which really had no other market.

And then IBM no longer has a PC division, and Steve Jobs is not central to this alliance ... those facts are worth at least 20 battalions.

As I said last week, the way to beat Intel is not to try to build some kind of many-core ARM Nehalem, but to build small server chips:

quote:

A 32-bit ARM can drive one DIMM. You could fit 32 or more Cortex-A9s on a chip, but the result would be completely unbalanced with only 256 MB RAM per core. Perhaps the opposite design point is better: only two cores, 4-8 MB of cache, and a single-channel memory controller. The resulting chip is very cheap with high yield because it is mostly cache. The power is low enough that you could put it on a DIMM like an AMB. You could call it BlueGene/ARM.

quote:

Originally posted by microbrew:Since it's IP, what could a computer architect combine with it to make it even more desirable? Special purpose encryption? DSPs for multimedia?

Originally posted by Nbenatar:No idea the feasibility but could an several ARM processors be coupled with a Cell processor?

2 GHz ARM vs. 4 GHz PPE? I wouldn't bet on ARM in that case.

The idea of PPE driving ARMs makes no sense I think to anybody ... but the idea of an A9 driving declocked and low-power SPU(s) may make quite a bit of sense in some applications and I have heard rumors people are considering it.

The logic for SPUs is licensable/synthesizable. Toshiba built/builds a "cutdown" Cell product called SPURs which is four SPUs only ... unfortunately Toshiba builds it on a 40nm bulk-silicon process (and the product may be poorly synthesized ... probably poor macro conversions) and it is a turkey compared to IBM's SOI process SPUs on Hz/W

NEON is not really that great a vector extension. ARM's floating point throughput is also nothing all that great. If you wanted something as a pretty serious vector/numeric engine associated with an ARM system ... dropping in a SPU (or 2, or 3..) would be a nice way to get it, and presuming the parts are being built in "Common Platform" (SOI) it should be really easy to get a lower power/lower Hz optimized SPU (and the other stuff you need with it comes with it). SPUs don't need to be clocked way up.

It takes very few "hooks" into a CPU to allow it to master SPUs -- one of the great beauties (to IBM) of going to a vector coprocessor with a separate instruction flow is that it means that the vector engine doesn't "crud up" the main processor's decode, instruction dispatch/retirement etc. It keeps the scalar processor design "cleaner" progress on one or the other doesn't require rebuilding the other.

There are a lot of people who would like to have two (or more) cores of A9 (at say 2GHz+) in lieu of PPE, because A9 does a lot better at branchy code. If Sony/IBM were designing the PS3 today ... one can even imagine that this might be the design choice.

On the otherhand it's not at all clear what IBM is doing on its many fronts, what next-generation Cell may see (other than that "Quasar" will be Power7 cores with SPUs!!).

It's not clear that IBM will provide a next-generation "Cell" (PPE + SPUs) as a standard blade/compute product although several IBM roadmaps show a PPE + 16 SPU design at 32nm

WRT Cell ... this old article at EE Time Asia claims that Cell will get eDRAM into the roadmap (not delivered) in 2008

Named eDRAM—for embedded DRAM—the technology will be a key feature of IBM's Cell processor roadmap starting sometime in 2008. IBM's Cell chip, which it co-produces with Sony Electronics and Toshiba, is the core CPU in all three of the best-selling video game platforms: Sony PlayStation, Microsoft Xbox, and Nintendo Wii.

But the simple fact that the Wii doesn't use Cell (instead it uses another IBM PPC processor, and does have eDRAM on board) make this citation look dubious as any sort of proof ... at best.

IBM has just released a 476FP CPU, which is a new embedded PPC core which is synthesizable IP ... and there is some speculation that next-gen BlueGene may be 476FP based, and that BlueGene may get low-power optimized SPU(s). All of this is pretty airy speculation so far as I know.

The 476FP is probably the closest PPC chip available now in terms of comparing to A9, I've no idea how that comparison would come off.

Originally posted by robrob:Will there even be a A9 netbook on the market in that time frame?

If by "netbook" you mean "runs Windows" I can guarantee you the answer is NO.

I wouldn't be so sure. The NT architecture is very portable, it was designed that way, and MS could make it work on ISA x at very short notice. Or they could do another Windows CE...

To be fair, WinCE was meant for extremely limited hardware and wasn't even vaguely related to any other MS product. But it does run on ARM and does give MS the experience on the platform.

If ARM starts to look promising in the general purpose computer arena, bet your bottom dollar that MS will be all over it. Back when RISC was looking less a fad and more a revolution, NT was running on PowerPC and Alpha.

Unlike the Apple PPC->Intel transition, however, ARM doesn't have the CPU horsepower to run Intel binaries (from any real app) at any real speed. So even if Windows were available on ARM, none of the applications would be. Which means, effectively, you may as well stick to WinCE and the available software library already running on WinCE.

I don't see why you say: "Same for Mac OS X on ARM... and please read that previous link before writing in to inform me that the ARM-based iPhone runs Mac OS X"

I read the previous link. I don't think that Apple would move their main Mac hardware to ARM (even though they *have* successfully made *two* processor transitions -- and NeXTSTEP ran on a few more different HW platforms), but why wouldn't they make a netbook with this ARM chip? They aren't going to go for some bargain-basement device, but they might do something that isn't a full laptop. If a netbook is for people who only care about a web browser an email, than why not go with the iPhone OS on an ARM chip? If it gives Apple the horsepower they want and longer battery life, then they'd hook that netbook-ish device up to their App store and have a fine ecosystem.

You may well be right, but the link you provided did not convince me of the point you are so certain of.