As expected, Intel® announced the Haswell E5 processor family for Servers and Workstations at IDF on September 9. Coincidentally the event was just up the valley from Apple’s event announcing the (ARM-based) iPhone 6, 6Plus, and Apple Watch. Between the two media-saturation blitzes, one could barely find coverage of misbehaving NFL stars or Russian would-be Czars in the day’s news headlines. While few would connect these two events in any way, to me there is a common thread, best summarized by my interpretation of their messaging: “It’s a floor wax! It’s a desert topping! Its everything you ever wanted, and more!” If you’ll allow me, …

Calxeda has announced its second generation SoC, the ARM® Cortex™ A15 based EnergyCore™ ECX-2000. This is the industry’s first ARM-based SoC enabled for full OpenStack clouds, Xen and KVM virtualization, and delivers twice the performance of the first generation ARM-based server SoCs. Calxeda will demonstrate the new platform running Ceph object storage and OpenStack at this week’s ARM TechCon conference in Santa Clara, October 29-31. Notably, HP has selected the ECX-2000 for an upcoming Moonshot server in early 2014. Calxeda also added a second 64-bit SoC to its roadmap that is pin-compatible with the ECX-2000, accelerating the availability of production 64-bit Calxeda-based systems in 2014 and protecting customers investments.

While this is big news, there is a far more important story to be told. The new ECX-2000 is just the next step on the journey to a far more efficient datacenter. This journey will fundamentally reshape the datacenter infrastructure into a fleet of compute, storage, networking, and memory resources; the so-called Software-defined Data Center.

Intel is widely expected to announce a new version of their ATOM SOC for microservers next week. Based on the Silvermont microarchitecture, the Avoton SOC is widely expected to repair their reputation after the disastrous Centerton product that has been largely ignored as a way-too-little-too-late response to ARM.

While we all eagerly await the final specs, and prices, some speculate that this chip will make it harder for ARM-based server SOCs to get traction. I think the opposite is more likely. If this chip is really good, and priced to sell, it means that Intel itself has capitulated to the market demands for a lower power chip designed for real workloads instead of benchmarks. And THAT will validate everything the ARMy of SOC guys have been saying: you don’t always NEED a Xeon behemoth, so why pay for it in terms of power, space, and $$$?? And of course, they wouldn’t do that, at the potential expense of Xeon margins, if they really thought this was less than 10% of the market, and if they didn’t feel threatened by ARM.

So, I’m rooting for Intel for a change. Validate the market, join the party, and may the best RACK win! (a thinly veiled reference to the important of the fabric, which Calxeda fans may appreciate ;-)

Advances in multi-core computing have allowed far greater compute densities such that nearly all datacenter racks run out of available power far sooner than physical space. Traditional High Performance Computing (HPC) X86 clusters can consume upwards of 400W per rack unit (U), this means that a typical data center rack with a 5KW – 8KW circuit can be maxed out in as little as 1/4 or 1/2 of the available space. Many of today’s forward thinking IT leaders are asking “Why can’t I have both extremely dense computing and better power efficiency?”

As reported in variousoutletsyesterday, Intel has released their S1200 line of Atom SOC’s targeting the microserver market with the tagline: “Intel Delivers the World’s First 6-Watt Server-Class Processor”. The first notable point here is that they had to use 6 Watts, because 5 was already taken. The second notable point is their definition of “Server-Class”. Looking at the list of features on the Atom S1200, there are key “Server-Class” features missing:

Networking: Intel’s SOC requires you to add hardware for networking

Storage: Once again, there is no SATA connectivity included on the Intel SOC, so you must add hardware for that

Management: Even microservers need remote manageability features, so again with Intel you need to tack that on to the power and price budgets.

Based on what Intel disclosed today, here’s a snapshot of Calxeda EnergyCore 1000 vs. Intel’s new S1200 chip:

ECX1000

Intel S1200

Watts

3.8

6.1

Cores

4

2

Cache (MB)

4 Shared

2 x .5 MB

PCI-E

16 lanes

8 lanes

ECC

Yes

Yes

SATA

Yes

No

Ethernet

Yes

No

Management

Yes

No

OOO Execution

Yes

No

Fabric Switch

80 Gb

NA

Fabric ports

5

NA

Address Size

32 bits

64 bits

Memory Size

4 GB

8 GB

So, while the Centerton announcement indicates that Intel takes “microservers” seriously after all, it falls short of the ARM competition. It DOES have 64-bits and Intel ISA compatibility, however. Most workloads targeting ARM are interpreted code (PHP, LAMP, Java, etc), so this is not as big a deal as some would have you believe!Intel did not specify the additional chips required to deliver a real “Server Class” solution like Calxeda’s, but our analysis indicates this could add 10 additional watts PLUS the cost. That would imply the real comparison is between ECX and S1200 is ~3.8 vs ~16 watts. So roughly 3-4 times more power for Intel’s new S1200, again, comparing 2 cores to 4. Internal Calxeda benchmarks indicate that Calxeda’s four cores and larger cache delivery 50% more performance compared to the 2 hyper-threaded Atom cores. This translates to a Calxeda advantage of 4.5 to 6 times better performance per watt, depending on the nature of the application.

It’s the middle of June, which means we’re smack in the middle of tradeshow and conference season for the IT industry. We were at Computex in Taipei two weeks ago, and this week we’re participating in International Supercomputing in Hamburg, and GigaOM’s Structure conference in San Francisco. In fact, our CEO, Barry Evans, is on a panel to discuss fabric technologies and their role in the evolution of datacenters. Should be a good one!

In spite of the hectic season, it hasn’t stopped us from moving forward with what everyone is really waiting for: benchmarks! Well, I’m happy to be able to share some preliminary results of both performance and power consumption for those of you looking for more efficient web servers.

Barry Evans, Calxeda CEO and co-founder, has been invited to speak at the upcoming Cloud Fab, GigaOM Structure, in San Francisco on June 20. If you have never been to Structure, it is the industry’s premier event for advancements in Cloud Computing hardware and software innovations, with a list of speakers from some of the industry’s movers, shakers, and entrepreneurs. Barry will be on a panel with AMD’s Andrew Feldman (formerly CEO of SeaMicro) and Guido Appenzeller, CEO of BigSwitch Networks. Their panel is titled “Inside the data center: it’s all about the fabrics”. This is a hot area. AMD’s acquisition of SeaMicro, and Intel’s acquisition of fabric assets from Cray, have recently piqued interest from around the industry. As we move from the age of the “Clock Wars” to the age of the “Core Wars”, and now to the age of the “Efficiency Wars”, the cluster and datacenter interconnect fabric and software-defined networking are emerging as the source of the next series of breakthroughs.

Calxeda will also have live hardware running in our booth at the show. Come by to for a demo of OpenStack and other Cloud infrastructure running on Calxeda-based hardware from the industry’s leading system vendors.

If you would like a discounted ticket to attend Structure, Calxeda can help you out. Go to our website, where you will find a link to obtain a discount code for 25% as a friend of Calxeda!

The acronym “SoC” generally refers to “System on a Chip”. But with SoCs entering the server space, it is also taking on a new meaning: “Server on a Chip”. An SoC is a large scale integration of processor cores, memory controllers, on-chip and off-chip memories, peripheral controllers, accelerators, and custom IP (intellectual property) for specific applications and uses. As Moore’s law continues, chip process geometries shrink, allowing more transistors to reside on the same area of silicon. Traditionally, server processors have used this new real estate to add more cores. But there are better alternatives than just adding more cores for certain applications.

Increasing integration in an SoC brings a number of benefits including:

Higher performance – significantly faster and wider internal busses compared to those found in a multi-chip or multi-board solution.

Lower power – wider range of power optimization techniques can be employed in SoCs including power gating, changing bus speeds depending upon utilization, dynamic voltage and frequency scaling of processor cores and peripherals, multiple power domains, and a number of others. Additionally, having peripherals on chip avoids power hungry PHYs (analog drivers that need to drive signals between chips and boards).

Higher density – fewer components to buy, consume power, and fail.

Deeper integration of peripheral controllers and fabric interconnect technologies allow a number of advantages that cannot normally be achieved by having to go through standard bridges like PCIe.

Let’s stop and consider the components we typically will find in a standard rack-optimized volume server:

One or two processor chips, often with integrated memory controllers.

One or two chips for processor chipsets providing a range of functions like Southbridge peripherals and PCIe.

A PCIe connected Ethernet NIC, either chip or PCIe board. In today’s volume servers, this is typically one or two 1 Gb Ethernet interfaces.

A PCIe connected SATA controller, either chip or PCIe board.

Controller chip for an SD card and/or USB.

An extra cost, optional BMC (baseboard management controller) providing out of band system management control.

So, now with the availability of a purpose-built ARM® server SoC, how does this change? Everything in the laundry list above gets integrated onto a single, low power die. For example, let’s take a look at the Calxeda EnergyCore ECX-1000 series of SoCs. In each chip, we find:

A quad-core Cortex A9 CPU, configured for server workloads.

The largest L2 cache that you’ll find on an ARM server: 4 MB with ECC.

A server class memory subsystem including a wide, high-performance 72-bit DDR3/3L memory controller, also including ECC.

Integrated peripheral controllers that have direct DMA interfaces to the internal SoC busses without the PCIe overhead. Standard server peripheral controllers like multiple-lanes of SATA, multiple Ethernet controllers (both 1 Gb and 10 Gb), even an SD/eMMC controller for local boot or scratchpad storage, are all integrated on-chip.

If your server needs to connect to devices that are not integrated, there are four dual-mode PCIe controllers, supporting both root-complex and target modes, in both x4 and x8 configurations.

Instead of an optional (and expensive) BMC, management is built onto every chip, providing a sophisticated server management system that provides both in-band and out-of-band IPMI/DCMI system management interfaces along with dynamic power and fabric management.

A deeply integrated, power and performance-optimized fabric interconnect, which we’ll talk about in a future blog entry.

And all of this is designed with performance, power, and cost optimized servers in mind, delivering the industry leading performance/Watt and performance/Watt/$ servers.

Calxeda EnergyCore ECX-1000 Block Diagram

With all the typical server components integrated onto a single chip, you can build a server by “just adding power and DRAM”. And even that is made easy for our customers with a card-level reference design of four EnergyCore SoCs, power regulators, DRAM, and fabric interconnect.

For the last several years, SoCs have been used in embedded systems and mobile devices for the same reasons and benefits discussed above. The server industry is now applying those same lessons learned to it’s own domain. No matter what the design looks like, a better integrated and power optimized Server-on-a-Chip is needed for the scale-out, cluster demands of our Internet generation.