A full-size server rack can fit 76 Baserock nodes, for a total of 2,432 cores.

Codethink, a Linux consulting company headquartered in the UK, has launched a new ARM-based server product called the Baserock Slab. Each individual node includes eight quad-core ARM CPUs. Two nodes can fit in a single 1U slot, for a total of 64 cores per rack unit.

The company says that it can fit up to 76 nodes in a full-size rack, for a total of 2,432 cores, without requiring specialized power or cooling infrastructure. The eight System on Module (SoM) boards in each Slab node use Marvell Armada XP ARMv7 chips clocked at 1.33GHz and have 2GB of RAM. The servers are compatible with Codethink’s own custom Linux environment and Debian. The company’s website says that support for additional distributions, including Ubuntu, will arrive soon.

ARM servers like the Baserock Slab offer high density computing with relatively low power consumption compared to conventional x86 servers. Codethink markets the Baserock Slab as an option for energy-efficient cloud computing, ARM build infrastructure, or server appliances. There are a number of products from other vendors that offer even higher density, such as HP’s Redstone server platform, which packs 288 quad-core Calxeda SoCs in a single 4U chassis.

Such configurations are potentially useful for handling certain kinds of heavily distributed workloads, but it’s still not clear whether they are truly cost-effective in common cloud usage scenarios.

35 Reader Comments

How would this compare to the custom Intel Atom servers we started seeing a couple years ago? Those really packed on the cores as well (something like 2 or 4 dual-core chips per card with RAM, many cards per slot, etc)?

I don't think X86 has much to worry about until 64bit ARM chips start rolling out. The 32bit memory limit make this a deal breaker for a lot of use cases.

It will be interesting to see what Intel tries in order to preempt the coming 64bit ARMs. They could pose a real threat, not this though.

I'd actually go the other way. It will be interesting to see what ARM can do to preempt the new Intel chips since Intel is at least two years ahead as far as fab goes.

Medfield came in late but showed comparable performance and power consumption to the ARM chips of last year. Obviously Krait would now be higher performance than Medfield, but Medfield was based on the 2008 Atom processor, and is at 32nm as well.

Clover Trail will be out soon, which should help performance but not battery life.

Silvermont is the big deal though I think. New Atom architecture (finally) AND 22nm Tri-Gate.

From the referenced Wikipedia article the quad core Intel Core i7 2600K, gets about 128K MIPS (I've got one in my desktop), and it's not as nearly powerful as a ten-core the Xeon E7 @ 2.8Ghz in Turbo mode.

Since the table doesn't reference the type of benchmark used, it's number are pretty suspect and not applicable to other benchmarks (there are lots of ways to measure MIPS... the Intel number come from Sisoft Sandra, where as the ARMv7 numbers are Anand Tech's estimates, and aren't based on actual benchmarks... and even if they were, if it's not the same code, the scale could be different among platform benchmarks).

Such configurations are potentially useful for handling certain kinds of heavily distributed workloads, but it’s still not clear whether they are truly cost-effective in common cloud usage scenarios.

Where do you get heavily distributed workloads? Parallel and concurrent algorithms are much harder to implement well. ARM does have a low power advantage, but in performance/power it does shine not.

This system can be used either as a server or in a HPC application. If each server request stays in its own thread, you are going to have a hard time filling all those cores, while having a longer response time. With HPC, GPGPU is all the rage now...

Really not a problem with free software. Arm is a very mature architecture in the linux world.

Absolutely. Some Linux distros are optimized for ARM CPUs and there are loads of supported software. For example on my Raspberry Pi I have an OpenJDK JVM running, which is by definition compatible with every other Java platform (IBM even managed to run Websphere on this platform).

Also keep in mind that ARMv7 is just the ISA specification, it's a bit like saying "x86_64 achieves %d MIPS".. which specific implementation of the ISA makes all the difference.

Within ARMv7 there is a difference in performance between CortexA8 and CortexA9, for example. I don't know much about Marvel's implementation - but for all I know, as an architecture licensee (I think?) they could have added all sorts of special sauce whilst maintaining ARMv7 compatibility: dual-issue, etc - so comparing "mips" for some generic v7 is a bit of a fluffy comparison.

(Not that I expect this to change the fact that an i7 will beat it clock-for-clock, mind you, but this may help the MIPS/Watt comparison)

Where do you get heavily distributed workloads? Parallel and concurrent algorithms are much harder to implement well. ARM does have a low power advantage, but in performance/power it does shine not.

This system can be used either as a server or in a HPC application. If each server request stays in its own thread, you are going to have a hard time filling all those cores, while having a longer response time. With HPC, GPGPU is all the rage now...

you've not heard of the web then? A million users all asking for data from a website is a million-task parallel problem. Ok, they don't all hit it at the same time, but you get the idea. Web serving is a perfect task for highly parallel servers, especially as it doesn't need massive amounts of CPU performance, just enough to serve each request.

From the referenced Wikipedia article the quad core Intel Core i7 2600K, gets about 128K MIPS (I've got one in my desktop), and it's not as nearly powerful as a ten-core the Xeon E7 @ 2.8Ghz in Turbo mode.

Since the table doesn't reference the type of benchmark used, it's number are pretty suspect and not applicable to other benchmarks (there are lots of ways to measure MIPS... the Intel number come from Sisoft Sandra, where as the ARMv7 numbers are Anand Tech's estimates, and aren't based on actual benchmarks... and even if they were, if it's not the same code, the scale could be different among platform benchmarks).

I used what I could find - they may not be 100% accurate, but should be representative. It's tough when no one seems to want to actually benchmark these things (especially ARM). But in my opinion, we need to know what the performance gap is between x86 and ARM before anyone can get excited about a rack full of ARM chips in a server.

All of these claims about being more power efficient is fine for a smartphone, but does that efficiency scale with clock speed, work load, and the volume of chips required to perform the same as a i7 or Xeon E7?

But in my opinion, we need to know what the performance gap is between x86 and ARM before anyone can get excited about a rack full of ARM chips in a server.

In my opinion if you aren't solely concerned with "raw power" but are more interested in getting as much parallel processing into as small a footprint as possible, with as little power consumption as possible then there is every reason to be extremely excited.

Can you include some performance numbers? I mean, 2432 cores sounds amazing, but if 4 Xeon E7 CPUs outperform it then I'm not sure it is that big of a deal.

OK I found some numbers on Wikipedia and a Cisco whitepaper:ARM v7: 2,850 MIPS at 1.5 GHzXeon E7: 96,900 at 2.4 GHz

So the ARM at 1.33 GHz would be about 2532 MIPS.

So it would take about 64 Xeons to equal the 2432 cores. This can be done in a Blade system. Hard to compare the actual power requirements though because Codethink doesn't provide any real data.

I've long argued that the processing density of ARM does not remotely justify usage in server clusters or anything remotely rack based with the narrow exception as an alternative to lower end ASICs for security and similar where you want the process outside of the general purpose CPU.

The focus for ARM usage should continue in the embedded market as well as usage in glue logic. It's a great architecture for places where computing power is secondary such as low end routers, storage, printers and similar. Mobile usage past tablets is marginal given that even i5 ULV processors deliver an order of magnitude greater processing power at only about 25-50% more watts per MFLOP.

Once you are to the size of a rack, computing density with CPUs and GPUs should take precedence over core count or power usage because real estate becomes a bigger concern. The system can be at 75% less power needed to run a rack, but if you need 3x the number of racks, your saved power will be lost in real estate cost and the increased cooling needs due purely to tonnage rather than waste heat. There are instances where this would not hold true, such as building a server farm in an underground salt cave, but for the majority of situations, space matters.

Intel is hopefully developing towards more ultra low wattage Atom processing power as well as lower power i series CPUs or just convergence of the architectures. It might help if they did more towards complete SoCs with integrated baseline RAM, though maintained expandability, unlike current architectures, ARM or x86 which are one or the other.

Cogent Computer Systems, Inc. (my company) designed the SOMs and the baseboard HW. The design was primarily meant to show the viability of an ARM server using our off the shelf SOM's.

Some comments to the various questions that have come up:

The ArmadaXP uses the Marvell Sheeva PJ4 core rated to 2.41DMIPS. So at 1.33Ghz you get ~3,200 theoretical MIPS.

A 4GB version of the SOM will be available by years end.

Power consumption is estimated at 150W per server, but needs more complete testing with various loads and use cases to be accurate.

What about external factors? Like, how much heat do these buggers produce? If they could cut a server room's cooling costs in 1/2 while still maintaining performance, that might be a big, indirect cost impact.

If you're wanting to use a system for computationally expensive workloads, you're looking in the wrong place. If you're looking for something that can DMA data from one place to another and requires little CPU oversight (web, DNS, mail server) or you're doing stuff natively on ARM, this is a good choice.

I see this more as a proof of concept scenario.There are more powerful ARM processors out there both in clock speed and core count.Also as mentioned earlier in this thread, ARM is rapidly developing both it's next generation of smaller dye and it's next generation of 64-bit ARM processors.

With this as a POC, let's see where it leads with some next-gen hardware. IBM and Intel have nothing to worry about yet.