Posted
by
timothyon Saturday June 16, 2012 @11:25AM
from the 16-pages-seriously dept.

An anonymous reader writes "Phoronix constructed a low-cost, low-power 12-core ARM cluster running Ubuntu 12.04 LTS and made out of six PandaBoard ES OMAP4460 dual-core ARMv7 Cortex A9 chips. Their results show the ARM hardware is able to outperform Intel Atom and AMD Fusion processors in performance-per-Watt, except it sharply loses out to the latest-generation Intel Ivy Bridge processors." This cluster offers a commendable re-use of kitchenware. Also, this is a good opportunity to recommend your favorite de-bursting tools for articles spread over too many pages.

What would calculating the theoretical peak tell them about the (real) sustained performance?

Partitioning the problem in chunks that can be distributed to the nodes in the cluster adds overhead. Assembling the finished results does the same. It is kind of hard to predict what this over will be as it depends on the interconnect. In this case they used 100Mb/s ethernet, but there was contention from running NFS over the same network. Building it and measuring it is the only way to find out what kind of performance you really get.

Well, it's hard to get first post and simultaneously develop a complete explanation of the concept, but...

They have provided yet another valuable datapoint in the theoretical peak vs actual sustained performance testing set, but, again, this is widely studied, characterized fairly well and predictable with a bit of research and thought experiment.

Reading the article (also impossible to do in a first post time constraint), reveals that they had a particular idea about using a wooden dish strainer to rack the

half the fun is building it.
good excuse to build a 12-core mini-cluster. I think this is nothing more than some nerd showing off his latest toy. Which is not a bad thing. this 12-core'd cluster might be useful, at the very least proof of concept stage. I could imagine the uses for a highly paralleled mini-super-computer on an affordable budget.

I think the confusion is that people think Atom is analog to ARM. People keep confusing the fact that ARM is a core processor and Atom an SoC solution. It makes no sense comparing apples to oranges. An appropriate comparison would be an SoC from TI, Qualcomm or Samsung.

I'm not against this type of benchmarking, I actually enjoy reading people writing them up. Now, on the other hand, I don't think it's fair to compare a cluster of laptops vs. a cluster of desktops. It's fun... but without the proper metrics, it's useless.

How many cores, can you fit in a cubic meter for example, what's the performance per watt per cubic meter. What's the performance of a solution like the Tegra. How do you measure the difference between added hardware, like radios or GPUs, etc.

I think the confusion is that people think Atom is analog to ARM. People keep confusing the fact that ARM is a core processor and Atom an SoC solution. It makes no sense comparing apples to oranges. An appropriate comparison would be an SoC from TI, Qualcomm or Samsung.

But then how could they generate media hype by announcing they are outperforming intel?

Why can't you compare apples to oranges? The are both fruits, are picked from trees, spherical, has mass and volume, colour and taste. Of course you can compare them! The result of such a comparison will most likely conclude they are not very similar, but the same conclusion will probably be made by comparing a car to a navel.

What I don't understand is why the summary is focused on ARM beating Atom when the overall winner - in performance, in performance per watt, and in cost - was the Intel Ivy Bridge... by a huge margin.

Because this is slashdot and the AMD/ARM vs Intel bias is almost as strong as Linux vs Windows? Their best selling point is the APUs but in reality Intel is the one favored most by the move to decent integrated graphics, people still buy Intel but now instead of an AMD/nVidia entry level card many just stick with the integrated one, making GPU market share become more like CPU market share. And Intel is the one with a half-decent ARM competitor (Intel Medfield), AMD isn't ready to play in that arena at all.

Agreed... and frankly... I thought the comparison was utter crap... Really... a first generation Atom against a modern ARM? First generation Atom was utter crap and solved no other issue that providing a cheap atom based platform to play with. What about the N2600 or even better... the Medfield (had to google for ages for that name haha)? Atom 330 was just not worth it.

Not to mention the point is.....what exactly? The whole selling point of ARM is how long it will run on a battery, plugged in the difference between say 7w and 12w really isn't enough to get your panties in a twist over and while ARM may get lower power usage while doing work there is no denying that Intel and AMD have the IPS crown by a pretty wide margin, even more if your code is able to use OpenCL so you can use both halves of a Fusion APU to do useful work.

Well the nice thing I've found, which is why I ignore most of the benches, is that X86 has gone so far beyond good enough and into insanely overpowered that even a low end system simply never gets stressed, the users simply can't come up with enough work for the chip to do.

Last year I built my dad a Phenom I quad because i found a kit cheap, now most here would consider that a pretty weak chip, we're talking a 2.1GHz first gen Phenom. Now guess what I found? That after 3 months he had simply never gone abo

Applications are SLOWLY making better use of multiple core machines, and that means that as time goes forward, more cores makes for a better experience. The problem you are seeing, that many people are not stressing the system is caused by applications not making good use of system resources. In most cases, even multi-threaded apps are using what, two or three threads when they should be using six or more for what is being done.

Basically, we are seeing most developers failing to re-write applications

A lot of apps simply can't be threaded that well.Even games, with all their graphical snd sound goodness can't use multiple cores that effectively.you will have one heavy thread which is doing all the graphics, you can throw AI on to one or two threads, put sound on another and UI on another, plus networking and other IO could be on additiona threads, but the graphics thread will be the really heavy one, and the rest will be very lightweight in comparison. You can't break the graphics thread out to multiple

That is why I never understood AMD having identical cores as it seems to be a waste with the exception of a few apps like video processing. That is why I snatched a Thuban when they were cheap as at least turbocore will ramp up when you are only using one to three cores heavily but a better design would probably be an uber-powerful Core 0, followed by a decently powerful Core 1 & 2, with the Cores after that being slightly more powerful than Bobcats.

If you REALLY care about bang for the buck in the low power space your best best would probably be an AMD E350 kit [newegg.com] which is just $120 USD with a nice little case. I've built a few HTPCs and office boxes out of these and they are not bad little units, if you are really concerned about price you can use Open ELEC [openelec.tv] which makes a Fusion version of their distro with the XBMC UI for a nice dirt cheap HTPC and for offices there are several distros that support fusion OOTB although my customers prefer Win 7 which ru

Except they didn't need all 6. 2 Pandaboards = 1 Atom 330 nettop. (A shade less in one benchmark, a bit more in the other.)

And I'm not sure where you pulled that $60 figure from, but I haven't seen any 330 nettops that cheap. Is this that thing where you count the whole system for one side, and just the CPU for the other side?

"Besides winning on performance and efficiency, the Core i7 3770K system would cost less than the cost of a six PandaBoard ES cluster setup."So a single Ivy Bridge system, which takes up much less rack space, no cluster network ports, outperforms and costs less than the ARM cluster. Is that the definition of a no-brainer?

That cluster would probably be more valuable if you melted it down to sell the precious metals inside it.
I can't believe they bothered, I can't believe someone wrote an article about it.. somehow I can believe it would get posted to slashdot, though.

I must have been under a rock for the past few years, but are Ivy Bridge processors really more power-efficient than Atoms, Fusions and even ARMs? I thought they were designed more for speed than efficiency, while the others were made for low consumption. Was I wrong? On the internet?

They're more power efficient if you're looking for high performance at reasonable power levels. The ARMs might be much better for tasks that don't need much computation but if you end up needing to combine a bunch of ARM boards into a cluster to get the performance you need then there's a lot of overhead that adds to the power consumption without giving you much.

With the EP.C workload on all twelve ARM cores, the average power consumption was 30.4 Watts for all six PandaBoards, which is in line with each PandaBoard burning through 5~6 Watts under load. When it comes to the performance-per-Watt, the EP.C test was yielding an average of 1.78 Mop/s per Watt, which was an increase over the single PandaBoard ES at 1.60 Mop/s per Watt.

Page 8 of TFA (yes, my quote was the entire text on that page) claims otherwise, that efficiency of the cluster is even better than that of a single board. I really have no idea how they managed that.

I also noticed that the combined wattage requirement was less than that of a single system multiplied by the number of units. I'm guessing that their simple meter is not accounting for all the load, since there are transformers in the AC power supplies.

My guess, is they may be using a different power supply. The pandaboard takes 5V @ 4 amp - hardly anything, really. A single quality 90% efficiency desktop PSU with 6 5V rails will supply that much power and, even if not operating at peak efficiency (low-amp high-efficiency PSUs are hard to find), it may have beat out the common wallwarts used for the devices.

You're confusing efficiency with total power consumption. A desktop Ivy Bridge certainly pulls more watts than the E-350 or Atom boards, but the amount of work that Ivy can do for each of those watts is higher, which gives Ivy the efficiency lead but not a total power-consumption lead.

Ivy bridge is more efficent in work done per watt yes, but ARM still wins for low power devices like phones because it draws so much less power. The fact that it does less with that power is moot, because it does enough and lets your battery last much longqer.

In addition to the much-increased overhead of the cluster (all the mainboards, memory, storage, etc), the Ivy Bridge chip is on the brand-new 22nm process size while the Atom and ARM chips they tested are stuck on the old 45nm. They could have at least gone with 32nm Atoms and ARMs.

Because they are looking at performance per watt, not "power usage during normal use." Most people think of "power usage during normal use" when they are talking on the internet, because they're thinking of power usage in their phone.

Most people don't have clusters, but in that case you are interested in the power usage of your cluster while the thing is running at full speed. It's not something you're going to put in your phone, but Intel manages some efficient processors.

Sandybridge, and now Ivybridge, are drastic hand over fist improvements over their previous architectual designs - particularly in terms of power use. An i5 at idle, for instance, is more power efficient than the first generation Atoms as well as the first-generation AMD Bobcat boards (eg. Hudson), but can do a whole lot more while not idled and still maintains a relatively low power usage.

I suspect that the reason we never saw the Atom SoC (Atom 2) was because the power savings engineering went into Sandy

I'm getting Dramamine for everyone on Slashdot to counteract the ARM FUD.

1. Look at both the AMD and Intel boards for the low-end processors... notice anything? They have all of these... features like PCIe, real memory interfaces, SATA controllers, etc. etc. All of these features consume power. Huge amounts? Not really, but compared to both the E-350 and the Atom CPUs, the amount of power being measured for each board is including a very large amount of power that has zero to do with the CPU. Guess what would happen if I took an E-350 or Atom and put it in an equivalent to the Panda board?

2. Apparently ARM's marketing department ran out of money to pay the poster to describe the Ivy Bridge system used in this test. Here's the short results:
a. In the parallel benchmarks used in this test that are a (probably unrealistically) best-case scenario for the ARM cluster, a single Ivy Bridge CPU was 5 times faster.
b. Oh but ARM says: So what if Ivy is faster! It's a power hog... look it used over 100 WATTS OMG!!!! Well guess what? On a performace per-watt scale, the Ivy Bridge system is THREE TIMES BETTER THAN ARM.
c. Oh but the ARM fanboys will say that Intel cheated by using a better lithographic process!! Well guess what: ARM loudly brags that it is better because it is an IP only company, so you have to take the good with the bad.

4. Oh one more thing... the Ivy Bridge system had REAL PERIPHERALS like real memory, reali PCIe, a real SSD, etc. etc. that by themselves probably used more power than at least one of the ARM boards, probably 2 of them. Oh and by the way.. the power used for the network fabric needed to network those ARM boards... *NOT* included in the power consumption figures so ARM had that as an extra advantage! So in many ways the Ivy Bridge system was intentionally disadvantaged.. and was still THREE TIMES MORE EFFICIENT ON A PER-WATT BASIS THAN ARM IN A SERIES OF BENCHMARKS THAT ARE BEST-CASE-POSSIBLE SCENARIOS FOR ARM.

5. For all of those ARM fanbois who are about to say that PCIe, real RAM interfaces, real SATA support, etc. etc. are inelegant artifacts of the stupid x86 instruction set well.. bite me. The last 5 years of ARM trolls who have literally gone down the feature list of every feature that x86 has that ARM doesn't and found a way to call the features that ARM lacks stupid and moronic (until ARM implements them years later and then claims to have come up with them first) is pissing me off.

Oh one more thing: The Ivy Bridge system is also cheaper not only for up-front price but also for long-term power efficiency and you don't have to worry about maintaining 6 sets of a hardware and updating software on 6 different nodes in a cluster.

Do you mean that OMAP doesn't have PCIe, real memory interfaces (what do you mean by "real memory interface"? Is there something like a "fake" memory interface?") SATA controllers, etc. etc. etc. Sorry, but they DO HAVE THEM. Plus, the OMAP 4 series has a GPU, video encoder/decoder, its own 2D accelerator and whatever interface it requires to create a smartphone. Guess what will happen if the OMAP lacked all that stuff?

I don't suppose it does much good mentioning at this point that the Pandaboard has what is at this point a fairly dated CPU with a fairly low clock. When it came out, it was decent, but at this point it's almost 2 years old. The Tegra 3, for instance, puts it to shame in pretty much every regard.

I was impressed that it gets the first 11 pages, and then it includes a 'Next Page' link to in-line the remaining pages. The problem is it didn't get the performance images, which are in separate iframes.

So intel finally beats arm in performance/watt, but a 2 board cluster beats intels lowest power offering. So, basically intel has finally eroded the advantage arm has in servers, but arm still maintains an edge in small, low power devices. I love that arm has been so competitive in certain areas. Its good to see something other than x86 everywhere. Imagine if there was no iphone. Imagine if there was no competition and arm was still just a slow, but modern and power efficient core? ARM has come a long, long

So I'm asking myself how 12 ARMs equal the power consumption of one Atom. So I have sit through all the page loads. The "Atom" is a complete off-the-hself "Net Top" box designed to maximize performance (spinning hard drive and high-end graphics card) with the sole constraint of being noiseless -- i.e. the Atom was chosen by the NetTop manufacturer for low heat, not for low energy consumption.

OK, then for the comparison with Ivy Bridge, I wasn't surprised. I've been salivating about the low-power versions of Ivy Bridge for several months now. But this comparison wasn't even againt that. They used the highest clock cycle highest power 3770K variant, which is rated at 77W [wikipedia.org]. There is a 45W version for a bit lower clock speed. (BTW, Intel "produces" low-power variants the same way they "produce" high-clock variants -- they test the chips after manufacturing to see which ones draw less power.)

So, basically, the comparison is completely pointless and a waste of time.

To a first approximation, heat = energy consumption. You have to dissipate all that energy you use as heat, after all, and that's why lower wattage parts always run cooler, all other things being equal.

I'll repeat my point one more time. In a Net Top environment, one has access to AC electricity, but one also desires quiet operation so as to not interfere with the home theatre operation. Quiet means no fan means low heat. Low energy is not relevant to the needs of Net Top. Yes, I understand that low energy = low heat from an engineering standpoint, but try to understand the design requirements. And this difference in design requirements is completely behind why the reviewer ended up with an apples to

Real servers need ECC RAM. I'd be reluctant to even run a home file server without it, if that server contains critical data.

Does ARM support ECC? If not, then it can be ruled out on that basis alone. Atom and Bobcat can also be ruled out at this time since neither support ECC RAM.

A while back Intel announced a 2-core, 1.2 GHz Sandy Bridge "Pentium 350" that has a max TDP of 15W and has the standard server chip package, including ECC support. This would be nice for small, low-power servers. But for some rea