It seems only yesterday that we saw Intel introduce their 22nm FinFET technology, and now we are going all the way down to 5nm. This is obviously an exaggeration. The march of process technology has been more than a little challenging for the past 5+ years for everyone in the industry. Intel has made it look a little easier by being able to finance these advances a little better than the other pure-play foundries. It does not mean that they have not experienced challenges on their own.

We have seen some breakthroughs these past years with everyone jumping onto FinFETs with TSMC, Samsung, and GLOBALFOUNDRIES introducing their own processes. GLOBALFOUNDRIES initially had set out on their own, but that particular endeavor did not pan out. The ended up licensing Samsung’s 14nm processes (LPE and LPP) to start producing chips of their own, primarily for AMD in their graphics and this latest generation of Ryzen CPUs.

These advances have not been easy. While FinFETs are needed at these lower nodes to continue to provide the performance and power efficiency while supporting these transistor densities, the technology will not last forever. 10nm and 7nm lines will continue to use them, but many believe that while we will see the densities improve, the power characteristics will start to lag behind. The theory is that past 7nm nodes traditional FinFETs will no longer work as desired. This is very reminiscent of the sub 28nm processes that attempted to use planar structures on bulk silicon. In that case the chips could be made, but power issues plagued the designs and eventually support for those process lines were dropped.

IBM and their research associates Samsung, GLOBALFOUNDRIES at SUNY Polytechnic Institute Colleges of Nanoscale Science and Engineering’s NanoTech Complex in Albany, NY have announced a breakthrough in a new “Gate-All-Around” architecture made on a 5nm process. FinFETs are essentially a rectangle surround on three sides by gates, giving it the “fin” physical characteristics. This new technology now covers the fourth side and embeds these channels in nanosheets of silicon.

The problem with FinFETs is that they will eventually be unable to scale with power as transistors get closer and closer. While density scales, power and performance will get worse as compared to previous nodes. The 5nm silicon nanosheet technology gives a significant boost to power and efficiency, thereby doing to FinFETs what they did with planar structures at the 20/22nm nodes.

One of the working EUV litho machines at SUNY Albany.

IBM asserts that the average chip the size of a fingernail can contain up to 30 billion transistors and continue to see the density, power, and efficiency improvements that we would expect with a normal process shrink. The company expects these process nodes to start rolling out in a 2019 time frame if all goes as planned.

There are few details in how IBM was able to achieve this result. We do know a couple things about it. EUV lithography was used extensively to avoid the multi-patterning nightmare that this would entail. For the past two years Ametek has been installing 100 watt EUV litho machines throughout the world to select clients. One of these is located on the SUNY Albany campus where this research was done. We also know that deposition was done layer by layer with silicon and the other materials.

What we don’t know is how long it takes to create a complete wafer. Usually these test wafers are packed full of SRAM and very little logic. It is a useful test and creates a baseline for many structures that will eventually be applied to this process. We do not know how long it takes to produce such a wafer, but considering how the layers look to be deposited it takes a long, long time with current tools and machinery. Cutting edge wafers in production can take upwards of 16 weeks to complete. I hesitate to even guess how long each test wafer takes. Because of the very 3D nature of the design, I am curious as to how the litho stages work and how many passes are still needed to complete the design.

This looks to be a very significant advancement in process technology that should be mass produced in the timeline suggested by IBM. It is a significant jump, but it seems to borrow a lot of previous FinFET structures. It does not encompass anything exotic like “quantum wells”, but is able to go lower than the currently specified 7nm processes that TSMC, Samsung, and Intel have hinted at (and yes, process node names should be taken with a grain of salt from all parties at this time). IBM does appear to be comparing this to what Samsung calls its 7nm process in terms of dimensions and transistor density.

Cross section of a 5nm transistor showing the embedded channels and silicon nanosheets.

While Moore’s Law has been stretched thin as of late, we are still seeing these scientists and engineers pushing against the laws of physics to achieve better performance and scaling at incredibly small dimensions. The silicon nanosheet technology looks to be an effective and relatively affordable path towards smaller sizes without requiring exotic materials to achieve. IBM and its partners look to have produced a process node that will continue the march towards smaller, more efficient, and more powerful devices. It is not exactly around the corner, but 2019 is close enough to start planning designs that could potentially utilize this node.

Extreme Ultraviolet Lithography has been the hope for reducing process size below the current size but it had not been used to create a successful 5nm chip, until now. IBM, Samsung and GLOBALFOUNDRIES have succeeded in producing a chip using IBM's gate-all-around transistors, which will be known as GAAFETs and will likely replace the current tri-gate FinFETs used today. A GAAFET resembles a FinFET rotated 90 degrees so that the channels run horizontally, stacked three layers high with gates filling in the gaps, hence the name chosen.

"IBM, working with Samsung and GlobalFoundries, has unveiled the world's first 5nm silicon chip. Beyond the usual power, performance, and density improvement from moving to smaller transistors, the 5nm IBM chip is notable for being one of the first to use horizontal gate-all-around (GAA) transistors, and the first real use of extreme ultraviolet (EUV) lithography."

Earlier this month at the Hot Chips symposium, IBM revealed details on its upcoming Power9 processors and architecture. The new chips are aimed squarely at the data center and will be used for massive number crunching in big data and scientific applications in servers and supercomputer nodes.

Power9 is a big play from Big Blue, and will help the company expand its precense in the Intel-ruled datacenter market. Power9 processors are due out in 2018 and will be fabricated at Global Foundries on a 14nm HP FinFET process. The chips feature eight billion transistors and utilize an “execution slice microarchitecture” that lets IBM combine “slices” of fixed, floating point, and SIMD hardware into cores that support various levels of threading. Specifically, 2 slices make an SMT4 core and 4 slices make an SMT8 core. IBM will have Power9 processors with 24 SMT4 cores or 12 SMT8 cores (more on that later). Further, Power9 is IBM’s first processor to support its Power 3.0 instruction set.

According to IBM, its Power9 processors are between 50% to 125% faster than the previous generation Power8 CPUs depending on the application tested. The performance improvement is thanks to a doubling of the number of cores as well as a number of other smaller improvements including:

Important for finance and security markets, massive databases and money math.

IEEE 754

CAPI 2.0 and NVLink support

Hardware accelerators for encryption and compression

The Power9 processor features 120 MB of direct attached eDRAM that acts as an L3 cache (256 GB/s). The chips offer up 7TB/s of aggregate fabric bandwidth which certainly sounds impressive but that is a number with everything added together. With that said, there is a lot going on under the hood. Power9 supports 48 lanes of PCI-E 4.0 (2 GB/s per lane per direction), 48 lanes of proprietary 25Gbps accelerator lanes – these will be used for NVLink 2.0 to connect to NVIDIA GPUs as well as to connect to FPGAs, ASICs, and other accelerators or new memory technologies using CAPI 2.0 (Coherent Accelerator Processor Interface) – , and four 16Gbps SMP links (NUMA) used to combine four quad socket Power9 boards into a single 16 socket “cluster.”

These are processors that are built to scale and tackle the big data problems. In fact, not only is Google interested in Power9 to power its services, but the US Department of Energy will be building two supercomputers using IBM’s Power9 CPUs and NVIDI’s Volta GPUs. Summit and Sierra will offer between 100 to 300 Petaflops of computer power and will be installed at Oak Ridge National Laboratory and Lawrence Livermore National Laboratory respectively. There, some of the projects they will tackle is enabling the researchers to visualize the internals of a virtual light water reactor, research methods to improve fuel economy, and delve further into bioinformatics research.

The Power9 processors will be available in four variants that differ in the number of cores and number of threads each core supports. The chips are broken down into Power9 SO (Scale Out) and Power9 SU (Scale Up) and each group has two processors depending on whether you need a greater number of weaker cores or a smaller number of more powerful cores. Power9 SO chips are intended for multi-core systems and will be used in servers with one or two sockets while Power9 SU chips are for multi-processor systems with up to four sockets per board and up to 16 total sockets per cluster when four four socket boards are linked together. Power9 SO uses DDR4 memory and supports a theoretical maximum 4TB of memory (1TB with today’s 64GB DIMMS) and 120 GB/s of bandwidth while Power9 SU uses IBM’s buffered “Centaur” memory scheme that allows the systems to address a theoretical maximum of 8TB of memory (2TB with 64GB DIMMS) at 230 GB/s. In other words, the SU series is Big Blue’s “big guns.”

A photo of the 24 core SMT4 Power9 SO die.

Here is where it gets a bit muddy. The processors are further broken down by an SMT4 or SMT8 and both Power9 SO and Power9 SU have both options. There are Power9 CPUs with 24 SMT4 cores and there are CPUs with 12 SMT8 cores. IBM indicated that SMT4 (four threads per core) was suited to systems running Linux and virtualization with emphasis on high core counts. Meanwhile SMT8 (eight threads per core) is a better option for large logical partitions (one big system versus partitioning out the compute cluster into smaller VMs as above) and running IBM’s Hypervisor. In either case (24 SMT4 or 12 SMT8) there is the same number of total threads, but you are able to choose whether you want fewer “stronger” threads on each core or more (albeit weaker) threads per core depending on which you workloads are optimized for.

Servers supporting Power9 are already under development by Google and Rackspace and blueprints are even available from the OpenPower Foundation. Currently, it appears that Power9 SO will emerge as soon as the second half of next year (2H 2017) with Power9 SU following in 2018 which would line up with the expected date for the Summit and Sierra supercomputer launches.

This is not a chip that will be showing up in your desktop any time soon, but it is an interesting high performance processor! I will be keeping an eye on updates from Oak Ridge lab hehe.

This video, about floppy disks, is a little bit longer and in-depth than their previous one about cassette tapes. The 8-Bit Guy and friends (I'm pretty sure they don't call themselves that...) goes through how many tracks each floppy have, how many sectors they have, and how that varies per-manufacturer (including the technical reasons of how and why they are formatted incompatibly).

The 8-Bit Guy likes to go through a bunch of hardware, spanning the gamut of Atari, Commodore, Apple, IBM PC, and others, and explain their history. The most interesting part of this video, to me, was his explanation of why the Commodore floppy drive was so much larger than its competitors, and what it meant for performance.

IBM's Power9 processor is scheduled to appear on the scene just over a year from now and finally we have some details about what it will be. Firstly the core count is to be two higher than Intel, 24 cores and is optimized for use in two socket servers. The chips are 14nm FinFETs fabbed by GLOBALFOUNDRIES which will be compatible with modern industry standards including DDR4, PCIe 4.0 and NVLink 2.0 so you can even take advantage of Jen-Hsun's latest products.

The list of customers is quite impressive, Google has moved to Power8 already and described changing to the infrastructure as simple as flipping a switch, the US Department of Energy will build their next HPCs using Power9 and Rackspace is currently working with Google to develop Power9 server blueprints for the Open Compute Project.

Several Chinese companies will take advantage of those OpenPower blueprints to develop their own 'partner chips', Power8 and 9 architecture which will be using 10nm gates in 2018 to 2020. This is somewhat amusing considering the shipping of Xeon processors to China has been banned by the US Government. Check out more of the slides from IBM's presentation at The Register.

"IBM's Power9 processor, due to arrive in the second half of next year, will have 24 cores, double that of today's Power8 chips, it emerged today.

Meanwhile, Google has gone public with its Power work – confirming it has ported many of its big-name web services to the architecture, and that rebuilding its stack for non-Intel gear is a simple switch flip."

IBM will be making its Spectrum Scale software available on Seagate's ClusterStore HPC products, which are due out towards the end of the year. This marks a turning point in Seagate's HPC business as previously their products were only useful to a small group of companies which used the Lustre file system, moving to IBM's product grows the available pool of customers significantly. HP will be adding their Apollo software suite into the deal making this even more attractive for potential clients. As The Inquirer points out, this is part of the shift of international companies moving their data out of US borders, good news for ISPs and data providers in the rest of the world but not such good news for those looking for employment in the industry within the USA.

"SEAGATE HAS JOINED FORCES with HP and IBM in a bid to boost its position in the high-performance computing (HPC) market."

The heavy hitting partnership of IBM, Samsung and GLOBALFOUNDRIES have designed and created the first chip built on a 7nm process using Silicon Germanium channel transistors and EUV lithography. Even more impressive is their claim of 50% area scaling improvements ovver 10nm, a very large step in such small processes. IBM told PC World that they will be able to fit 20 billion transistors on a 7nm chip which is a tenfold increase over Braswell as an example of current technology. The Inquirer reports that this project also cements the deal between GLOFO and IBM; GLOFO will be the exclusive provider of chips for IBM for the next decade.

"IBM'S RESEARCH TEAM has manufactured functional test chips using a 7nm production process, making it the first in the industry to produce chips with working transistors of this size."