New “atomic” server: 512 Atom CPUs take on Xeons

A startup called SeaMicro announced a server that squeezes 512 x86 CPUs into …

When Intel first unveiled its "Silverthorne" microarchitecture in 2008, it was clear that the low-power, in-order processors based on it (i.e., Atom) would be at the very bottom of the performance heap. It's fair to say that none of us in the tech press ever expected to see a server based on the design, much less one that would be aimed directly at Intel's high-end Xeon line. But we weren't the only ones watching the Silverthorne announcement.

SeaMicro is a Silicon Valley hardware startup that began work in July 2007 on a server that could gang together cheap, low-end processors like Atom in ways that would make sense for Web-centric server workloads. The result of that effort was unveiled today in the form of the SeaMicro SM10000, a datacenter server that squeezes 512 Atom processors into 10U of space and draws 2KW of power.

SeaMicro claims that the server provides the same SPECINT performance as a Dell Xeon server in one-fourth the space and at one-fourth the power. Its biggest advantage is that it runs existing x86 software, from the disk images on each node up to load-balancing and automation packages.

How it works

We've been over and over the economic case for the kind of server architecture that the SM10000 represents, so there's no need to recap that here (check "Further Reading" below for more on "physicalization"). Instead, let's take a practical look at how the SM10K works.

Each server node consists of one Atom Z530 chipset (Atom CPU plus I/O hub), a small pool of DRAM, and a special ASIC chip that attaches to the I/O hub's PCIe bus. This ASIC is what makes the SeaMicro idea work, and it does so by pretending to be a standard complement of PCIe-based storage and networking controllers.

So while the front side of the ASIC attaches to the Atom chipset via PCIe, the backside of it attaches to a high-bandwidth, proprietary bus that connects the invidual server nodes to each other and to a shared pool of storage and network interfaces. The ASIC does a bit of sleight-of-hand for the Atom node that it's attached to, making it appear as if the OS instance that's running on that node is directly looking at and talking to the storage and networking controllers; in reality, those controllers are being shared by all of the Atom nodes in the system.

In other words, the ASIC implements a kind of I/O virtualization in hardware.

By removing all of the storage and I/O controller hardware from each server node in order to centralize it in one place in the system, and by leaving a single ASIC at the node as the kind of ghost or avatar for that (now centrally located) controller hardware, the SM10K dramatically reduces the size of each individual server node. The picture below shows a single board that packs eight server nodes in a 5" x 11" space.

SeaMicro's approach makes for an incredibly low cost per node, both in terms of bill of materials and power consumption. On some kinds of workloads, like a Web application that features transient bursts of lightweight threads, the Atom-based server should indeed deliver better performance per watt and per dollar than a Xeon server running a bunch of virtual machines.

Right now, SeaMicro only offers a product based on Atom, which the company claims is head-and-shoulders above the dual-core ARM A9 parts it tested in performance per watt on server workloads. But there's nothing stopping the company from offering ARM-based servers, since the basic design should theoretically work with any chipset or SoC that talks to peripherals over PCIe.

What will really rock this design is Intel's upcoming Tunnel Creek SoC, which is an Atom-based SoC that has a PCIe controller on the main die, instead of in an I/O hub. A successor to the SM10K based on Tunnel Creek could do away with the I/O hub entirely, cutting down the number of components in each node to just an SoC, some DRAM, and the ASIC.

If SeaMicro moved to two SoCs per node in this case, they could instantly double the number of Atom cores in their design without increasing its size. Absolute power consumption would go up, of course, but so would power efficiency (the lack of an I/O hub chip would mean fewer wasted watts).

It's likely that SeaMicro is only the first of a barrage of similar startups that we'll see in the coming years. Plenty of system architects across Silicon Valley think in exactly these terms, and SeaMicro's exit from stealth mode shows that the approach is finally making the transition from a popular "what if..." notion and a few one-off products to an actual movement.

Which would you rather have in an 8 kW power envelope, 640 Nehalem cores or 2048 Atom cores, because if you fill HP C7000's with the newest hex core dual socket blades that's what you can get in the same power and floor space envelope. For my environment the c7000 wins hands down, perhaps this would work for hosting providers but for most enterprises the Nehalem system wins every time.

A very clever and interesting product and implementation, but it seems niche at best - one of those "well for certain very specific kinds of workloads it's very good" things that only 1 out of every 100,000 companies could actually use.

IBM did something similar with their BlueGene supercomputer It consists of a large number of low-end 450 PowerPC processors, yet is presently #5 in the Top500 list of supercomputers (it was #1 for several years).

It's not just you. The new Xeons are efficient enough, all things considered, and immensely more powerful.

...and they come with ECC support as well. This product completely misses its target, and will be relegated to a niche of a niche. They might have had something if they went ARM A9's, but the Atom is mediocre by every metric.

The PPC BlueGene processors mentioned above are quite efficient and actually have decent FP performance. Using Atoms here just doesn't make any sense.

IBM did something similar with their BlueGene supercomputer It consists of a large number of low-end 450 PowerPC processors, yet is presently #5 in the Top500 list of supercomputers (it was #1 for several years).

It is clever to use this idea to build small and powerful servers.

Not quite. Each of those PPCs was bolstered by a doubled-up FPU, giving them impressive floating point capabilities, which was precisely the point of BlueGene.

These Atoms, on the other hand, are crap.

A top-end Xeon offers as much as 40 times the integer compute power of the Atom (benchmarks vary, of course, but the processors used here manage about 4,000 Dhrystone MIPS, and high-end Xeons can manage 130-160,000). The gulf in performance is enormous.

Its not clear that this would be fun to manage either. Managing 512 OS instances that are locked down to physical hardware...regardless of how tight it is packed, is not my idea of fun and easy. At least with a virtual environment, you have lots of options on how to redistribute and manage it.

It's not just you. The new Xeons are efficient enough, all things considered, and immensely more powerful.

...and they come with ECC support as well. This product completely misses its target, and will be relegated to a niche of a niche. They might have had something if they went ARM A9's, but the Atom is mediocre by every metric.

The PPC BlueGene processors mentioned above are quite efficient and actually have decent FP performance. Using Atoms here just doesn't make any sense.

I'm not seeing what you're seeing. Why would you need ECC support if you're just running a bunch of scaled out stateless web applications? And do most web applications really need great FP performance? This box seems like a solution that tries to find some balance between price and performance while utilizing much less power. I would guess that since this is their only product, it was the cheapest way to bring their architecture to market. Would it be unreasonable to assume that others will follow?

unless i am mistaken...this is the approach google has towards its web searching service. basic index searches require basically 0 floating point operations. google is said to use a bunch of low end p4's to power some of its data centers.

this completely makes sense to companies like google that can afford specialized computing for different needs. purchase the i7 xeons for the youtube and a bunch of these for google searches...

depending on the type of web service this could be a cheap option...personally though in my experience...i do prefer one big chip running many vm instances...it just seems easier to manage....and less likely that you will have some kind of hardware problem.

Which would you rather have in an 8 kW power envelope, 640 Nehalem cores or 2048 Atom cores, because if you fill HP C7000's with the newest hex core dual socket blades that's what you can get in the same power and floor space envelope. For my environment the c7000 wins hands down, perhaps this would work for hosting providers but for most enterprises the Nehalem system wins every time.

crs117> I agree that you are less likely to have a hardware problem, but on the flipside if you do have a hardware problem you are less likely to go down completely. In a lot of ways MUCH more resilient than a Xeon-powered virtual server setup.

If I recall correctly, the issue was not so much that they could fit so much power into so small a space, it's that there are data centers sitting with so much excess capacity that it only makes sense to use more physical space and go for cheaper machines. Add the whole green/save money on power/cooling argument and these could be an excellent fit for certain operations.

The article mentions too that the ARM processor would be a possibility if they were to be successful with this type of arrangement. That could also make things more interesting on the power/mips ratio.

No thanks to the author for not explaining confidential misspelled acronyms such as "an SoC" (System on a Chip). Readers are not all experts in the field.

There is also no Dell server that takes 40 rack units, so what is this SeaMicro server being compared to exactly? This article leaves much unexplained and sounds more like an expanded press release, the pricing is not even discussed.

Which would you rather have in an 8 kW power envelope, 640 Nehalem cores or 2048 Atom cores, because if you fill HP C7000's with the newest hex core dual socket blades that's what you can get in the same power and floor space envelope. For my environment the c7000 wins hands down, perhaps this would work for hosting providers but for most enterprises the Nehalem system wins every time.

I was talking a couple c7000's almost full of 2x220 with the low power hex core's, it was the highest MIPS/watt solution I personally think of and two full enclosures are just over 8kw so I scaled it back till I got to what they can fit in a rack. Each blade has 4x hex cores on two system boards. Depending on how expensive your datacenter floor space is you can go up to 1536 cores in a rack but you'll need a hell of a cooling solution as normal forced air doesn't scale well to 20kw in a rack. It's kind of amazing to me that you can do 70 drives in 5U and decrease your power density (HP SW 600 modular storage system, 1200W)