HackMag

Why do we need ARM on servers?

A small optimized kit of ARM chip commands is perfect for mobile devices. Thanks to lower power consumption, it is very popular today for being used in smartphones and tablets. However, recently there has been a lot of talk about the ARM chips being introduced into the area entirely occupied by Intel — the servers.

Current situation in the server CPU market

Today, data processing occurs most frequently on the server side. Client devices (tablets, PCs, laptops, smartphones), whose number is increasing in a geometrical progression, serve only to enjoy the benefits. For this reason, the server market is constantly growing, while the servers themselves are getting faster. This gives rise to several problems simultaneously. On the one hand, the number of installation spaces is decreasing, while the power supply and the need to extract heat are increasing. Partially, the problem is solved at the expense of increasing the server density in the rack cabinet and using a more effective system of power consumption management (CPU, as a rule). However, any modern server consists of a large number of HDD, network appliances and such other interfaces. On a scale of the data center, all of them are wasting hundreds of megawatts. On the other hand, various applications require different loads, it is very problematic to calculate such load for some of them, so the hardware is taken to be on the safe side, for other types it is determined more precisely. Recently, the problem of overcapacity has been solved with the assistance of visualization, by hooking dozens of VMs to the server. But, again, this raises the price and increases the complexity of administration. There is no ideal solution, as you see. We need compact, economical systems that can be switched on an as-needed basis.

It was for this reason that in 2013 a new concept of micro-servers which represented power-saving modules produced on the basis of system on a chip (SoC) technology captured the general interest. It was a ready-made single-chip solution with a low latency of many operations, which needed only to be complemented with RAM, HDD and to be connected to the network. As a rule, SoC has several Ethernet-pots of a few GB and PCle.

Micro-server

Such servers are in high demand for simple niche tasks where there is no need in big computation power or where the characteristics are known in advance (web-applications, initial level hosting, Memcached, network control, data storage, etc.). For example, the provider, instead of striping VDS, can simply add a new server of a required capacity into the slot. A microprocessor is a fine instrument developed to withstand concrete loads, and in this way it is different from a traditional approach advocated by Intel today. Miniature compactness and lower power consumption allows to reduce the number of power units, cooling fans and cables. For example, 1,600 Calxeda EnergyCore micro-processors take up half of the standard server rack and cost 63% lower than the typical server rack of the same capacity. If you need more capacity, such miniature servers can be grouped into clusters parallelizing the computation. This is precisely the way to imagine the data centers for the future. However, it should be noted that there is no universal systemic design in the category of micro-servers. For this reason, you can encounter various configurations featuring varying power capacities and capabilities.

The industry experts offer diverse forecasts. IC Insights research has concluded that the micro-server sales throughout the period from 2014 to 2017 will increase by an average 70% annually (in 2014, they increased by 139%). The analysts at Intel (which accounts for 92% of all server processor sales) believe that micro-servers will take up an insignificant segment of the market (up to 6%), Gartner has estimated this figure at 15%.

CPU for micro-servers

The new idea requires other equipment. The standard AMD Opteron and Intel Xeon server processors are not suitable for micro-servers. The companies have produced special products. Thus, Intel has presented single-chip Intel Atom S1200 systems (formerly known as Centerton), designed to be used in micro-servers, data storage systems, network equipment. The new series of processors is manufactured under the 32 nm process technology and will comprise three processor models with the clock speed ranging from 1.6 to 2.0 GHz. One of the key features of the new SoC is identified by Intel as supporting the 64-byte applications and availability of an appropriate set of commands, ECC and VT-d support. The single-chip system has two physical kernels and it supports four computations streams thanks to the Intel Hyper Threading technology. It has also a memory controller with support of up to 8 GB of DDR3 memory (for each CPU) and eight PCI Express 2.0 channels. So far, it is not exactly a SoC, for example, we can see that there is no Ethernet and USB, they will appear on a chip only in the next generation of 22 nm Atom Avoton chips (Edisonville platform), which are scheduled to be presented next year. This year, Avoton will contain from 2 to 8 kernels, 1 MB cache memory of the second level will be distributed between each pair of kernels. Maximum CPU frequency is expected to be 2.4 GHz, it can be increased up to 2.7 GHz with the special Turbo Boost technology. The power consumption level for the novelties will be within the range from 5 to 20 W.

But the most important thing is low power consumption. The TDP values are within the range from 6.1 to 8.5 W depending on the model. The most affordable single-chip system of the series is Atom S1220, its wholesale price is announced to be USD 54, the most expensive platform is Atom S1260, whose price is estimated at USD 64. To illustrate: TDP Intel Xeon E5 based on the Sandy Bridge-EP design ranges from 60 to 150 W and is priced from USD 202 to 2,614. According to the Intel data, the Atom S1200 series is used in micro-servers and network equipment manufactured by such companies as Accusys, CETC, Dell, HP, Huawei, Inspur, Microsan, Qsan, Quanta, Supermicro и Wiwynn. It is no secret that Atom was considered to be not very suitable for in-built devices, primarily, because of the software, however, there is no such problem for the servers, so Intel has enough grounds to assume that the S 1200 series will be extremely popular. However, the competitors are not asleep either.

AMD has presented its AMD Opteron processors of the X series Kyoto with x86 architecture, which have ensured a very high density and power saving capability. Supplied in two options. For instance, AMD Opteron X2150 is a server single-chip hybrid processor featuring a CPU and GPU combination (on the basis of AMD Radeon HD 8000, up to 128), the X1150 version has no GPU. X2150 are geared towards multimedia processing, X1150 — towards load distribution and SoC-servers. The x86 kernel is used, known as Jaguar, it contains four kernels that operate with a 2 GHz frequency, it has L2 2MB cache, in-built SATA port (S1200 does not have it) and supports up to 32 GB DRAM on CPU. According to CPU-tests, the Opteron X productivity is twice as much as that of Atom S. Meanwhile, TDP is 22 W. Available at the price of USD 99 and 64.

The new trend has led to MIPS- and RISC-processors being revived. For example, about half a year ago, AMD made an announcement on the development of A-series Opteron (Seattle) with very exciting characteristics: x64 four and eight kernels, clock frequency higher than 2 GHz, support of 128 GB DRAM. In the early 2014, the AMD Opteron A1100 was presented to the public, it uses the initial implementation of 64-byte ARM Cortex-A57 chip manufactured under the 28 nm technology process and operate at a frequency higher than 2 GHz. The senior model will contain eight kernels, 8 MB cache-memory of the third level and will support up to 128 GB RAM with ECC. Besides, specifically for Opteron A1100, AMD has developed a new memory controller capable of supporting both DDR3 and DDR4. Each pair of kernels divides the 1 MB L2 cache, on the average up to 4 MB of L2 cache memory for a chip. They will also have an in-built support for two Ethernet 10 Gb/s ports and eight SATA 6 Gb/s ports (SoC is capable of ensuring a full spectral bandwidth for all the eight SATA ports). As you can see, there are no graphic kernels in these solutions, and they are not planned to feature any time soon. This solution is designed for highly integrated SoC and is optimized for dense power saving servers. Opteron A is manufactured under the 28 nm technology process, so it has an excellent ration of productivity and power consumption — TDP equals 25 W. Its price is expected to be around USD 100. According to the manufacturer’s data, eight Opteron A1100 kernels ensure the productivity from 2 to 4 times higher than that of the four-kernel Opteron X2150 processor, while their price and TDP are equal.

Compare AMD Opteron X2150 and A1100

Servers on the basis of a new chip are expected to be announced in the forth quarter of 2014. Simultaneously, AMD has transferred to the Open Compute Project a specification for the new micro-server — AMD Open CS 1.0 Common Slot — and has been engaged in collaboration with the industry leaders to create a 64 bit software for the systems based on the ARM architecture: compilers, emulators, hypervisors, operation environment and application programs for all feasible tasks handled by the servers at data processing centers. This step is conducive to its further promotion by other developers. So, it is unlikely that the AMD processors with the AMD kernel will be encountered any time soon in the tablets, but they can perfectly well secure the operation of a cloud service.

By the way, a few years ago, AMD was planning to acquire MIPS Technologies, which has the license of the MIPS architecture, but in 2013, most of the patents were re-assigned to Bridge Crossing (in which ARM is a member).

Three magic letters ARM

The first ARM chips appeared in April 1985 due to the efforts made by Acorn Computers (now ARM Limited), a UK company. However, for a long time they were not so popular as the standard x86 chips, although they were energy efficient and displayed higher productivity while solving certain tasks. For example, Citrix believes that Xen works better with ARM than with Intel. The ARM chips are also popular with the in-built systems, network equipment, payment terminals and all sorts of measuring equipment. But the situation has changed dramatically with the massive advent of mobile gadgets, 90% of which use the ARM developments today. The ARM basis is provided by 32-bit RISC-architecture (that’s where the ARM acronym — Advanced or Acorn RISC Machine — comes from), which has been simplified and rendered more complicated simultaneously, but, essentially, used a set of simple commands processed with minimal costs involved. In contrast to traditional CPU, ARM is not a processor, it is a chip or SoC which can contain all the required elements: RAM controller, graphic accelerator, video- and audio-decoder, network modules and USB. It is good to understand that the set of ARM instructions has no particular effect on the power consumption. The fundamental principle behind the low TDP is the employment of the SoC architecture.

ATM Limited is currently largely engaged in the development and licensing of processor architecture, having abandoned the efforts to create concrete chip models or individual components and outsourcing such development to third party companies. For instance, Qualcomm and Apple have created proprietary modifications on the ARMv7 basis, which were called Scorpion, Krait and Swift. The most advanced recent development is the Cortex-A57 chip, licensed by AMD, Broadcom, HiSilicon, STMicroelectronics, Samsung, MediaTek and Huawei.

Licensing Cortex-А57

Starting with ARM7, three profiles have been determined to support a particular set of instructions, which allow to define the chip designation easily:

A (Application) — for high-productivity devices engaged in the execution of complicated applications;

R (Real Time) — for applications that operate on a real-time basis;

M (Microcontroller) — for in-built devices and microcontrollers.

For almost 30 years, several ARM generations and an extended set of commands have been developed. Apart from standard 32-bit instructions, there has been, for instance, a Thumb (T32) mode, which allows to perform a 16-bit set of instructions. In 2003, it was extended to Thumb-2 by being complemented with 32-bit commands, allowing to achieve conventional ARM productivity while performing the 16-bit instructions. The Jazelle technology (chips with J index) allows the Java byte code to be run directly within the ARM architecture. Besides, the set of commands can be extended with the help of co-processors. Here are some of them:

The NEON technology — a combination 64- and 128-bit set of SIMD (Single Instruction multiple Data) commands that allows to ensure acceleration for media applications and signal processing. In particular, it can help decode MP3 and work with the GSM AMR speech codec;
– the extension of VFP (Vector Floating Point) produces low-cost computation involving digits with floating decimal point with short/double precision, in accordance with the ANSI/IEEE Std 754—1985 standard. It is used for a broad spectrum of applications for the purpose of sound processing, 3D graphics, etc.:
– TrustZone Technology — security extension (from ARMv6KZ and upwards) — provides a simple alternative to adding a specific security kernel, providing two virtual processors supported by access control through hardware. The application kernels can switch over between two states (called the worlds) providing information security.

Later, processors with two and more kernels emerged. Cortex A9 has two kernels, A 15 has four kernels. As a result, Cortex-A15 chips have already managed to be as swift as Intel Atom.

Although x86 adopted the 64-bit architecture a long time ago, ARM was in no hurry to make such a step. The bit depth has no effect on the productivity, and there was no need in memory addressing to exceed 4 GB. However, in 2010, ARMv7 (Cortex-A15 and Cortex-A7) presented a LPAE (Large Physical Address Extension) designed to make hypervisor management and data partitioning more effective. It allowed to address the memory in excess of 4 GB. But the applications that required such addressing were not in existence, as a matter of fact. Today, the situation has changed, more and more OS’s and applications have become available only in the 64-bit assembly. That is why in October 2011, the ARMv8-A architecture was unveiled, it contained the AArch64 definition that allowed to perform 64-bit commands. The running of 32-bit applications within the 64-bit OS and launching of virtualized 32-bit OS with the employment of a 64-bit hypervisor is possible. Beside, ARMv8-A was complemented by cryptographic instructions for AES, SHA-1 and SHA-256.

ARM chip generations

Currently, ARMv8-A is implemented in high-productivity chip of the Cortex-A50 series represented by two varieties: Cortex-A53 and Cortex-A57. The first one is energy efficient, and the second one is highly productive, but both can work with large RAM capacity.

Processors with ARM Cortex-A57 kernels conform to the Open Compute Project and Common Slot Architecture standards promoted by Facebook. This allows to use them together with traditional x86 series processors on the existing OS even now, before the emergence of optimized software. Also, in January 2014, ARM released a Server Base System Architecture (SBSA) specification developed in joint effort with the OS and software manufacturers (Canonical, Citrix, Linaro, Microsoft, Red Hat и SUSE) and equipment producers (Dell, HP, Broadcom). The SBSA is engaged in defining minimum standards for better compatibility, transferability and integration into the functioning data centers.

Cortex-A57 chip

As usual, initially a big problem for development is posed by low degree of software readiness. Efforts have been underway to improve the situation, drivers can write even now, the SDK is being adapted and application porting is carried out. It is clear that it will take months to do it. Anyway, cross-platform solutions are of great help to ARM, they can be performed on any hardware platform.
The ARM architecture is supported by many operational systems. For example, ARMv8 is supported within the Linux kernel starting with 3.7 version. All recent Ubuntu 14.04 LTS and Red Hat releases support the 64-bit ARM. Besides, ARM will work within BSD (FreeBSD, NetBSD, OpenBSD), QNX, Android and others.

Ubuntu Server 14.04 LTS supports 64-bit ARM

.

Conclusion

In 2013, 30 million x86 server processors and over 8 billion ARM processors were sold. According to AMD, in five years, 25% of all servers worldwide will use the ARM platform. Nevertheless, none of the experts are willing to forecast the global market situation. So far, in an effort to be promoted on the market, ARM has faced the same problems as Intel when it tried to burst into the market of mobile processors. It will have to act against a strong competitor who has established a powerful presence and deep-rooted connection with the customers. However, the engagement of AMD, who has its own customer base, might jolly well play a positive role in stimulating the demand. So, it is highly probable that the 64-bit efficacious and energy-saving chip will find its loyal buyers. In any event, ARM hopes to get a 40% market share of chips for scalable web-servers, within the five-year perspective (around USD 10 billion).