It has been a full seven months since AMD released detailed information about its Opteron A1100 server CPU, and twenty two months since announcement. Today, at the Hot Chips conference in Cupertino, CA, AMD revealed the final pieces about its ARM powered server strategy headlining the A1100.

The Case for Low Power Server CPUs

Before we discuss the new Opteron A1100 details, let us review the background of why AMD designed an ARM powered CPU. It all comes down to the devices and services we now take for granted: cell phones, tablets, cloud storage, and cloud services. AMD presented a slide about a year ago that summed it up nicely.

The amount of internet users is growing by 8 to 12% every year. Apple, Google, Microsoft, Facebook, you-name-it, all invest huge sums of money into server farms to provide the services we have come to rely on. This trend gains more and more momentum as software companies like Microsoft try to emulate the success of Apple and Google by selling hardware (Apple) and providing free services (Google) that are ad-supported.

Building the infrastructure to support all these devices and users is a massive undertaking. Typically, companies buy traditional high powered servers (read: Intel Xeon) and partition their computing power up between many tasks as needed. However, this isn’t always the best strategy. For IO tasks, you are always bottlenecked by something other than the CPU, so there is not a reason to throw high cost high power CPUs at the problem. For webserver tasks, response time is paramount. However, with the huge number of users connecting, webservers have become an ‘embarrassingly parallel’ problem you can address with multi core CPUs - as long as there is enough muscle behind each CPU.

The ‘enough muscle’ issue has hindered previous low power high density webserver attempts. When we tested the Calxeda ARM compute cluster, there were only certain edge cases where it was more efficient than a dual core Xeon server running virtual machines. Calxeda themselves admitted that their processors, utilizing ARM Cortex A9s, were in the early adopter phase of ARM powered webservers. Calxeda stated it wouldn’t be until ARMv8 (where virtualization is supported) and Cortex A57 that ARM based servers would ‘cross the chasm’ and enter the mainstream.

With the Opteron A1100, AMD skipped the early adopter phase and chose something with a higher chance of initial success.

Meet the A1100: CPUs and IO

There are three types of ARM licenses: POP, processor, and architecture. A POP license stands for Processor Optimization Pack and provides the licensee with everything they need to send a chip to the fab. A processor license provides the details of an ARM core like Cortex A9 so you can implement it into your own SoC, but you are not allowed to customize it. Finally, there is the ultimate license, an architecture license. An architecture license provides all details of ARM instruction set (ISA) and CPU implementation so a licensee can implement their own custom CPU core using the ARM ISA however they see fit. AMD is a processor and architecture licensee. If AMD decides it can be competitive by shipping an SoC with an ARM designed CPU (processor license), they can do so without the effort designing their own ARM ISA CPU. If AMD wants to differentiate itself with a custom designed CPU using the ARM ISA, AMD can use its architecture license to do that, similar to Qualcomm’s Krait CPU cores. AMD has decided to do both. Today we discuss its processor license.

AMD’s first SoC containing an ARM CPU is code named Seattle, the Opteron A1100. Seattle features no less than eight 64-bit ARMv8 ISA, Cortex A57 cores. Depending on availability, this could be the first Cortex A57 CPU to hit any market, not just the server market. AMD will follow up in 2015 with a lower power version that is pin compatible with another x86 CPU, both of which are part of Project Skybridge. In 2016 AMD will leverage its architecture license and ship K12, a fully custom CPU design using the ARMv8 ISA.

Each pair of Cortex A57s in the A1100 shares a 1MB L2 cache (totaling to 4MB of L2), and they all roll up to a shared 8MB L3 cache. To address the server market, all caches are ECC protected except for the L1 instruction cache, which is parity protected instead. Instruction cache protection is not quite as important (invalid instruction just means a pipeline stall). AMD utilizes ARM bus interfaces and debugging support throughout the design. The Cortex A57 also implements cryptography extensions that are quoted by ARM to accelerate things like https by 3-10x over previous ARM designs.

The SoC has a dual channel (2x64-bit) DDR3/4 interface to up to 128GB of 1866MHz memory. Just like the caches, the memory path also supports ECC of the single-bit error correct / double-bit error detect variety. Registered (RDIMM), unregistered (UDIMM), and small-outline (SODIMM) memory modules are support by the A1100 SoC, but actual motherboards will likely support only one type of memory. The same goes for DDR3 vs. DDR4.

As the A1100 is a SoC, it integrates IO directly into the single chip instead of relying on an off-chip IO hub. Integrated components include 8 SATA 3 (6Gb/s) ports, two 10 Gbit Ethernet (10GBASE-KR) ports, one 10/100/1000 Ethernet port, 8 lanes of Gen3 PCI-Express (supporting 8x, 4x/4x, and 4x/2x/2x), I2C, SPI, and UART. The inclusion of this breadth of storage IO (8 SATA3 ports) along with the 2x10 Gbit Ethernet is particularly interesting as it gives us hints of how AMD will position the Opteron A1100 on the market. More on this later.

This could be a great thing. As a systems admin, I can see the need for small, low power servers for local purposes.

They'd be great for branch offices or small offices who have much of their infrastructure in the cloud or remote datacenters. Imagine spending $1000 on a local domain controller and file server branch cache with everything else remote instead of having to spend the $5000 we do today.

They'd also be great for a local departmental authentication server in a large office. Keep one as a domain controller in the wiring closet with the local switches for each physical section of the office, and it would reduce the load on the main domain controller in the datacenter, reducing equipment requirements, power consumption, and cooling costs. It would also increase redundancy of the AD domain.

There are a lot of possibilities for this thing. Virtualization has already taken care of most of the uses it could have had, but lower power servers do offer an alternative to that, too. Reply

The reference design is clearly tardeted at NAS, powerful NAS up to 8 drives, on the SOHO space. But I see an effective way to use it as Memcache server (without storage), benefitting of it's 128GB memory space with a power-effective CPU, probably 2 of them in a 1U box.

There's also other possibilities, such as SSD + 8core ARM CPU, 2 of them on a 1U box, for MongoDB or CouchDB.

Beside this reference design, there's a world of possibilities, considering this SoC and it's incredible potential (when it's not CPU-hungry tasks!)Reply

I'm guessing a big blocker to Windows Server on ARM would be getting people to convert server workloads to run on ARM that currently run on x64. But I could actually see one potential usage for Windows Server ARM: storage spaces. If Windows Server 2012 R2 (or vNext) gets developed for ARM, we could see ARM-based Windows Server storage devices serving up SAN-competitive iSCSI and SMB storage, at a greatly reduced cost.Reply

Maybe, it's all depend on what the market wants. If people want enough servers on ARM, there will eventually be Windows server for ARM. and may even be based on RT. If you remember Windows for IA-64, that will tell you how much MS willing to bang its server OS off the x86 codes. If not linux will steal all the markets. OS for server is a different beast, most servers don't need to run the usual x86 and usual win32/64 based applications.Reply