Immersion Server Liquid Cooling: ZTE Makes a Splash at MWC

Big data centers are often cooled by air, and large HVAC/air-conditioning machines. The ones near the Arctic Circle can rely on the outside air to help. If a center invests properly, especially with a specific design and layout in mind, then using water cooling is another investment that can be made. If a designer really wants to go off the deep end, then full immersion liquid cooling is a possibility.

Immersive liquid cooling is ultimately not that new, and is based on non-conductive liquids. It allows for the full system to be cooled: all of the components, all of the time, and removes the need for large cooling apparatus, and encourages energy recycling, which is a major metric for data center owners. For data centers limited by space, it also offers better density of server nodes in a confined space, ideal for deployments on the edge of communication networks.

There are two angles to immersion cooling: non-phase change, or phase change. The first one, non-phase change, involves using a liquid with a high heat capacity, and cycling through a heat exchange system. The downside of those liquids is that they often have a high viscosity (mineral oil), requiring a lot of energy to forcibly circulate. By contrast, the phase-change variety is, for most purposes, self-convecting.

The idea here is that the liquid being used changes from a liquid to a gas by the act of being warmed up by the component. The gas then rises up to a cool surface (like a cold radiator), condenses, and then falls, as it is now cooler again. The energy transferred into the radiator can then be circled into an energy recovery system. The low viscosity of the phase change material aids significantly in the convection, with the act of creating a large volume low density gas displacing the liquid for that convection.

The formation of the gas ultimately displaces liquid in contact with the hot surfaces, such as heatsinks, or as we'll discuss in a bit, bare processors. Forming a gas at the processor displaces the amount of liquid in contact with the heat spreader, restricting the overall cooling ability. Over the last 10 years, this phase-change immersion implementation has evolved, with liquids developed that have a suitably low viscosity but a good boiling point to be able to cool hardware easily in excess of 150. If you have ever seen us utter the words '3M Novec' or 'Fluorinert', these are the families of liquids we are taking about - low viscosity, medium sized organic molecules engineered with specific chemical groups or halogens to fit the properties needed, or combinations of liquids that can adjust to fit the mold needed. Bonus points for being completely non-toxic as well.

As mentioned, this is not a new concept. We have seen companies display this technology at events for years, but no matter when it happens, when a non-tech journalist writes about it, it seems to spread like wildfire. In the world of cool demonstrations at trade shows, this seems to fair better than liquid nitrogen overclocking. However, making it a commercial product is another thing entirely. We have seen GIGABYTE's server division demonstrate a customer layout back at Supercomputing 2015, and then the PEZY group showed a super-high dense implementation with their custom core designs at Supercomputing 2017, both showing what is capable with a tight cooperation. ZTE's demonstration at Mobile World Congress was specifically designed to show to potential customers its ability to offer dense computing with more efficient cooling methods, should anyone want to buy it.

A few things marked ZTE's demonstration a little different than those we have seen before. Much to my amazement, they wanted to talk about it! On display was a model using dual processor E5 v4 nodes, however next generation is using Xeon Scalable. I was told that due to the design, fiber network connections do not work properly when immersed: the distortion created by the liquid even when a cable is in place causes a higher than acceptable error rate, so most connections are copper which is not affected. I was told that they do not have a problem with the thermal capacity of the liquid, and supporting the next generation of CPUs would be no problem.

One of the marked problems with these immersion designs is cost - the liquid used ranges from $100-$300 per gallon. Admittedly the liquid, like the hardware, is a one-time purchase, but can also be recycled for new products when the system is updated. Our contact at ZTE mentioned that they are working with a chemical company in China to develop new liquids that have similar features but are a tenth of the cost. It was not known if those materials would be licensed and exclusive to ZTE however. As a chemist, I'd love to see the breakdown of these chemicals, also most of them remain proprietary. We did get a slight hint when GIGABYTE's demo a few years ago mentioned that the Novec 72DA it used is a solution of 70% 1,2-trans-dichloroethylene, 4-16% ethyl nonafluorobutyl ether, 4-6% ethyl nonafluoroisobutyl ether and trace other similar methyl variants.

One topic that came up was the processors. As noted in the images, the tops of the heatspreaders are copper colored, indicating that an engineer has taken sandpaper to rub off the markings. Normally with a heatspreader, the goal is for it to be as flat and perfect as possible, to provide the best contact through paste to the heatsink. With immersion cooling, the opposite is true: it needs to be as rough as possible. This creates a large surface area, and more importantly creates nucleation sites that allow the liquid to boil easier. This avoids cavitation boiling, caused when there is a limited surface, and the liquid boils a lot more violently.

A roughed up processor

Of course, the downside to an immersion setup is the ability to repair and upgrade. If possible, the owner does not want to have to go in and replace a part. It ends up messy and potentially damaging, or requires a full set of servers to be powered down. There is ultimately no way around this, and while the issue exists with standard data center water cooling, it is a more significant issue here. ZTE stated that this setup was aimed at edge computing, where systems might be embedded for five years or so. Assuming the components all last that long as well, five years is probably a good expectation for an upgrade cycle as well.