Calxeda’s first ARM server is a serious threat to x86 server domination

This site may earn affiliate commissions from the links on this page. Terms of use.

Fifteen months ago, a CPU developer named Calxeda made waves when it announced a joint effort with HP to develop dense ARM servers that would challenge x86 supremacy in the server market. The company promised that it could leverage the low power consumption of ARM products to build clusters of Cortex-A9 SoCs inside a rackmounted chassis.

There have always been questions about whether or not Calxeda’s approach would actually scale in real-world server workloads. Calxeda’s system design stacks EnergyCards in rows atop a large motherboard. Each EnergyCard contains four SoCs, four DIMMs, and 16 SATA ports. The SoCs are all quad-core Cortex-A9s with a larger-than-average L2 cache (4MB rather than 1MB). That works out to 16 Cortex-A9 cores per EC. Maximum memory per SoC is 4GB due to the Cortex-A9’s 32-bit limitations.

Anandtech’s Johan De Gelas (a name old-timers will recognize from Aces Hardware) has benchmarked and written the first review of a Calxeda-based system, the Boston Viridis. This system contains six EnergyCards, totaling 24 CPUs (96 Cortex-A9 cores) clocked at 1.4GHz. Anandtech ran the system through a range of synthetic and real-world application tests and compared its single- and quad-threaded performance to both Atom and Xeon-based solutions.

The results are sure to make Intel sit up and take notice. The ECX-1000 processor at the heart of the Viridis lags even Atom in some metrics, like bandwidth utilization (Atom is ridiculously slow compared to Xeon processors, just to put that in perspective). Its per-thread performance in integer workloads, however, is quite competitive with Intel’s in-order architecture. While it never matches the Xeon-based products in terms of single-threaded performance per clock, the synthetic tests show the ECX-1000 is an excellent product.

The real-world tests are stunning. Not only does Calxeda’s array of “wimpy” cores outperform Xeon processors in web server tests, it beats them in both raw performance and performance-per-watt. De Gelas writes that “the Calxeda’s ECX-1000 server node is revolutionary technology.” After seeing the performance figures, I agree. There’s a place for ARM products in the datacenter. This also makes AMD’s long-term bet on an ARM server solution look like a good idea.

The current caveats

There are still a number of real-world limitations on Calxeda’s ARM products. They’re limited by maximum RAM (4GB), the Cortex-A9’s bandwidth and architectural limitations, and the fact that software support is still in very early stages. If you wanted to buy the most flexible solution available today, you’d buy a Xeon or an Opteron, hands down. The Boston Viridis server Anandtech reviewed runs about $20,000 while the x86 hardware is less than half that price. Power consumption matters — but $12,000 per box pays for an awful lot of wattage.

Then there’s the external factors. Calxeda’s roadmap shows Cortex-A15 and future 64-bit Cortex-A57 CPUs as being in the pipe, but Intel has its own 22nm Atom refresh coming later this year. Atom is badly in need of a new architecture; the 22nm design could flip the performance advantage back to its own camp. Software and OS compatibility also favor x86, and by a wide margin. It’s also true that the upcoming ARM processors will inevitably draw more power than the Cortex-A9 — whether you use ARM or x86, there’s no getting around the fact that higher single-thread performance costs more energy, as does adding RAM.

ARM server shipments will be fractional for the next few years, but this is the biggest potential challenge to x86’s server monopoly in well over a decade. Success is scarcely assured, but the technology has promise.

Tagged In

correct me if I am wrong, but the standard Cortex A9/A15 architectures is 1MB L2 cache per core in a cluster.

chojin999

Absolutely fake benchmarks. No ARM Cortex architecture is even 1/10th as fast as a Xeon E5.

disqus_wgQijEvN2i

The secret sauce of this market is the fabric technology. This is why Calxeda win in this test.

Bosko

Exactly. They have a huge internal network with up to 1.2Tbit throughput ( theoretical ). This benchmark shows that they are fast at serving lot of small static files, which max their inter-connect. I would like to see a benchmark like serving a typical WordPress site or some kind of popular LAMP based solution, or Django ( python based ) site…

Johan De Gelas has been writing some of the best server and workstation articles for something like 13-15 years. The test methodology Anand used is published and verifiable.

If you’d bothered to read the article, you’d note that there are many benchmarks — including those that favor strong single-thread performance — where the Xeon chips make utter hash of the A9. If you were building a render farm, for example, you’d never, ever consider a system like this.

Different workloads present different use-case scenarios. At low concurrency rates, the Xeon is *indeed* faster than the ARM 9 chips. At high concurrency, the ARM chips are faster — because, in certain workloads, having a lot of small cores with small pools of RAM acting independently is better than a monolithic architecture.

JDRahman

Pay no attention to him. Either he’s a troll or has his wires crossed.
I always read his comments – entertaining

some_guy_said

I didn’t see any mips or flops measurements. Who said anything about raw performance?

What I did see was that the ARM setup was just as good as xeons at handling typical internet requests.

Fact of the matter is – you can only throw so many threads at a xeon (Or any chip), and if you can’t fully utilize all the overhead with those threads, then it’s just not going to show the same performance advantages.

so they are testing 96 A9 cores against 8 Xeon cores? so somewhere around 10 A9 cores gives the same performance as a Xeon core as they hold a small lead through most of the benchmarks. The best i can find is a quad socket board for the xeons which gives 32 total physical cores and 600 Watts worth of TDP. I read somewhere that a standard A9 core TDP is 1.9W x 320 cores (or 80 Quad core chips) to roughly match the performance of the Xeons = 608W, and that doesnt accont for the extra cache and may be way off anyway, but doesnt seem like they have really gained all that much here other than being able to somewhat compete with the blue juggernaut.

Also 80 arm chips would net a maximum addressable amount of ram of 320GB. 4 xeon chips have a max capacity of just shy of 3TB of ram.

either way, each will have its strengths and short comings depending on application. but right now, and I realize this is way into the apples to oranges land, but I dont see the WOW factor yet.

Is there a posibility of having a PAE kernel on an arm board? This could get past the 4GB memory limit. Yes I know the ammount of memory available to a process is still 4GB, but a higher maximum memory is still a good thing.

Sean O’Connor

It is an interesting question as to why Intel cannot get over the x86 architecture. We live in a finite world, there are not an infinite number of engineers working for Intel. They may be locked into x86 by their tool-chains. Certainly their senior management are not interested in providing low cost efficient designs to anyone. They want to sell $300 chips to wealthy western customers (with inevitable very high power consumption).

sleeplessinva

Because sometimes you never know when you want to play some classic DOS games on your Xeon processors and you don’t have any virtualization software around.

This site may earn affiliate commissions from the links on this page. Terms of use.

ExtremeTech Newsletter

Subscribe Today to get the latest ExtremeTech news delivered right to your inbox.