Systems – HPCwirehttps://www.hpcwire.com
Since 1987 - Covering the Fastest Computers in the World and the People Who Run ThemTue, 20 Mar 2018 00:17:45 +0000en-UShourly1https://wordpress.org/?v=4.9.460365857Scientists Estimate North American Snowfall with NASA’s Pleiades Supercomputerhttps://www.hpcwire.com/off-the-wire/scientists-estimate-north-american-snowfall-with-nasas-pleiades-supercomputer/?utm_source=rss&utm_medium=rss&utm_campaign=scientists-estimate-north-american-snowfall-with-nasas-pleiades-supercomputer
https://www.hpcwire.com/off-the-wire/scientists-estimate-north-american-snowfall-with-nasas-pleiades-supercomputer/#respondThu, 15 Mar 2018 23:33:11 +0000https://www.hpcwire.com/?post_type=off-the-wire&p=48849COLUMBUS, Ohio, March 15, 2018 — There’s a lot more snow piling up in the mountains of North America than anyone knew, according to a first-of-its-kind study. Scientists have revised an estimate of snow volume for the entire continent, and they’ve discovered that snow accumulation in a typical year is 50 percent higher than previously […]

]]>COLUMBUS, Ohio, March 15, 2018 — There’s a lot more snow piling up in the mountains of North America than anyone knew, according to a first-of-its-kind study.

Scientists have revised an estimate of snow volume for the entire continent, and they’ve discovered that snow accumulation in a typical year is 50 percent higher than previously thought.

In the journal Geophysical Research Letters, researchers at The Ohio State University place the yearly estimate at about 1,200 cubic miles of snow accumulation. If spread evenly across the surface of the continent from Canada to Mexico, the snow would measure a little over 7.5 inches deep. If confined to Ohio, it would bury the state under 150 feet of snow.

Most of the snow accumulates atop the Canadian Rockies and 10 other mountain ranges. And while these mountains compose only a quarter of the continent’s land area, they hold 60 percent of the snow, the researchers determined.

The research represents an important step toward understanding the true extent of fresh water sources on the continent, explained doctoral student Melissa Wrzesien, lead author on the paper.

“Our big result was that there’s a lot more snow in the mountains than we previously thought,” she said. “That suggests that mountain snow plays a much larger role in the continental water budget than we knew.”

It’s currently impossible to directly measure how much water is on the planet, said Michael Durand, associate professor of earth sciences at Ohio State. “It’s extremely important to know—not just so we can make estimates of available fresh water, but also because we don’t fully understand Earth’s water cycle.”

The fundamentals are known, Durand explained. Water evaporates, condenses over mountains and falls to earth as rain or snow. From there, snow melts, and water runs into rivers and lakes and ultimately into the ocean.

But exactly how much water there is—or what proportion of it falls as snow or rain—isn’t precisely known. Satellites make reasonable measurements of snow on the plains where the ground is flat, though uncertainties persist even there. But mountain terrain is too unpredictable for current satellites. That’s why researchers have to construct regional climate computer models to get a handle on snow accumulation at the continental scale.

For her doctoral thesis, Wrzesien is combining different regional climate models to make a more precise estimate of annual snow accumulation on 11 North American mountain ranges, including the Canadian Rockies, the Cascades, the Sierra Nevada and the Appalachian Mountains. She stitches those results together with snow accumulation data from the plains.

So far, the project has consumed 1.8 million core-hours on NASA’s Pleiades supercomputer and produced about 16 terabytes of data. On a typical laptop, the calculations would have taken about 50 years to complete.

Whereas scientists previously thought the continent held a little more than 750 cubic miles of snow each year, the Ohio State researchers found the total to be closer to 1,200 cubic miles.

They actually measure snow-water equivalent, the amount of water that would form if the snow melted—at about a 3-to-1 ratio. For North America, the snow-water equivalent would be around 400 cubic miles of water—enough to flood the entire continent 2.5 inches deep, or the state of Ohio 50 feet deep.

And while previous estimates placed one-third of North American snow accumulation in the mountains and two-thirds on the plains, the exact opposite turned out to be true: Around 60 percent of North American snow accumulation happens in the mountains, with the Canadian Rockies holding as much snow as the other 10 mountain ranges in the study combined.

“Each of these ranges is a huge part of the climate system,” Durand said, “but I don’t think we realized how important the Canadian Rockies really are. We hope that by drawing attention to the importance of the mountains, this work will help spur development in understanding how mountains fit into the large-scale picture.”

What scientists really need, he said, is a dedicated satellite capable of measuring snow depth in both complex terrain and in the plains. He and his colleagues are part of a collaboration that is proposing just such a satellite.

]]>https://www.hpcwire.com/off-the-wire/scientists-estimate-north-american-snowfall-with-nasas-pleiades-supercomputer/feed/048849European LEGaTO Project Seeks to Develop Energy Efficient Stackhttps://www.hpcwire.com/2018/03/14/european-legato-project-seeks-to-develop-energy-efficient-stack/?utm_source=rss&utm_medium=rss&utm_campaign=european-legato-project-seeks-to-develop-energy-efficient-stack
https://www.hpcwire.com/2018/03/14/european-legato-project-seeks-to-develop-energy-efficient-stack/#respondWed, 14 Mar 2018 16:58:07 +0000https://www.hpcwire.com/?p=48738A new European project – Low Energy Toolset for Heterogeneous Computing (LEGaTO) – seeks to develop a software stack that improves energy management in support of heterogeneous computing. The idea is a familiar one. The limitation of transistor size reduction brought about by the decline of Moore’s Law has prompted growing use of heterogeneous computing […]

]]>A new European project – Low Energy Toolset for Heterogeneous Computing (LEGaTO) – seeks to develop a software stack that improves energy management in support of heterogeneous computing.

The idea is a familiar one. The limitation of transistor size reduction brought about by the decline of Moore’s Law has prompted growing use of heterogeneous computing architectures to squeeze more performance out of systems. One stumbling block has been relatively slow progress in advancing power management technology to manage systems that use multiple types of processing.

LEGaTO is a three-year project funded by the European Commission with a budget of more than €5 million. The effort will be based at the Barcelona Supercomputing Center. According to the announcement, “The project will strive to achieve this objective by employing a task-based programming model which is energy efficient by design coupled to a dataflow runtime, while simultaneously ensuring security, resilience and programmability.”

Specifically, the LEGATO project aims to:

Improve the energy efficiency of heterogeneous hardware by an order of magnitude through the use of the energy-optimized programming model and runtime;

Reduce the size of the trusted computing base by at least an order of magnitude;

Osman Unsal and Adrian Cristal, coordinators of the LEGaTO project are quoted in the announcement: “Moore´s Law is slowing down, and as a consequence hardware is becoming more heterogeneous. In the LEGaTO project, we will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. Our aim is one order of magnitude energy savings from the edge to the converged cloud/high-performance computing (HPC).”

]]>Quantum computing – exciting and off-putting all at once – is a kaleidoscope of technology and market questions whose shapes and positions are far from settled. Hyperion Research (formerly IDC’s HPC group) is now ramping up efforts to aggressively track this emerging sector. Led by Bob Sorensen, who recently added chief analyst for quantum computing to his title of VP of research and technology, a big chunk of next month’s HPC User Forum agenda (April 16-18), organized and run by Hyperion, is devoted to quantum computing and will serve as a kind of introduction to the new practice.

Hyperion, of course, is hardly alone. Quantum computing is garnering attention from all quarters including market watchers, technology developers, industrial users, and governments alike. Notably, additional funding for quantum computing was included in the Trump administration’s latest budget request for the U.S. exascale computing program (see HPCwire article: And So It Begins…Again – The FY19 Exascale Budget Rollout). That quantum computing is (or will be) important seems agreed upon. Among organizations with quantum presentations at the next HPC User Forum are Google, Microsoft, Intel, Rigetti Computing, NASA, NIST, MIT, IBM, and D-Wave.

Bob Sorensen, VP of research and technology and chief analyst for quantum computing, Hyperion

In a wide-ranging conversation with HPCwire, Sorensen discussed the triggers for Hyperion’s increased attention to quantum computing, emerging geopolitical rivalries in QC, the need for algorithm and application development, and outlined some of the goals for Hyperion’s quantum computing tracking plans; the breadth of some of the latter may surprise you.

“We have been tracking quantum computing development for a number of years, almost in a maintenance mode kind of a way where if there was an interesting development we kept in touch,” said Sorensen. “What happened in the last few years is we have moved beyond the stage where we are tracking interesting physical qubit development – the idea that Google or IBM or Microsoft or Rigetti has an interesting new design for a qubit piece of hardware. It’s moved now more into what we consider to be the stages of a more complete quantum computing (QC) ecosystem that encompasses hardware, software, application development and perhaps most importantly QC algorithms.”

Fundamentally and for the longest time, said Sorensen, quantum computing was a solution looking for a problem. Now, instead of just chasing qubit technology – and lots of that is still going on – it’s about identifying use cases and building tools and including domain specialists.

“IBM is a company that gets it. It’s not only developing this quantum computing hardware, but also putting out an ecosystem that allows people to play with a quantum computing simulator in a relatively low barrier-to-entry kind of way, accessible through the cloud, that encourages people to start to think about how you develop QC algorithms that matter. To me that was the final tick in the check list that said this is going to take off now,” said Sorensen. “Once we start to see some sophisticated algorithms coming out then the next step is when can we start to pump those it applications.”

Ever hear of Shor’s algorithm? Everyone else has too! That was a problem, said Sorensen. In spite of the hype there really haven’t been many new and interesting quantum computing algorithms. A handful of uses cases have dominated the discussion. Acquiring a better understanding of nitrogen fixation processes is a favorite. And it’s important. But others are needed and the new tools will foster their development. There are many problems whose solutions are largely intractable with classical von Newmann computing architecture but which seem well suited for quantum computing.

“It’s not just about building the hardware anymore. So IBM has a quantum simulator. You’ve got Atos which is making a simulator of a quantum system available – they are going to sell an appliance which in some sense is a nice little supercomputer in a box that has all the outward appearances of a quantum computer. Again, that’s trying to foster the ecosystem around algorithms and applications in addition to the hardware. Look at Rigetti. I love the fact they call themselves a full stack QC company, which means they are going all the way from essentially building the QC hardware all the way up offering programming language and an application development environment.”

“One of the things I am hoping this program draws attention for is the idea of collecting quantum computing grand challenges as we used to have in the HPC world. What were the big problems of the day that we needed HPC for,” said Sorensen. “The compelling issue of grand challenges is they were accessible to someone who didn’t have to know what HPC was or MPI was or multicore programing. They just knew that this machine could solve a compelling use case in pharmaceuticals, in oil and gas, in medical technology and that was enough. What I am hoping for now is to gather counterpart quantum computing grand challenges.”

Sorensen has broad goals for the new practice. He has also reached out to the quantum computing community seeking participation on an expert panel to help guide Hyperion efforts and stimulate the quantum computing conversation more generally. That said, there are the usual market analysis questions.

“How big of a market will there be for quantum computing and how will it break out? How much hardware? How much software? How many of these [quantum] simulators? How will this affect the HPC sector? Where will the revenue streams come from? If we could wave the magic wand and come up with a nice market forecast for the next five or six years out, that was highly credible and based on insight, at this point that would be a Holy Grail. That’s not going to happen I think the very near future,” said Sorensen

“This entire field is in a huge state of flux right now. I wouldn’t even call it the Wild West. It may be another decade before it is the Wild West. All we are trying to do is get a sense of what the experts, the people who are involved in the day to day issues of pushing this technology forward, what do they think about where this is all going. [At least initially] I think what we are going to find is there is no right answer. We are just staking out the landscape boundaries.”

An important element in understanding quantum computing’s boundaries is figuring out how ubiquitous it will be. Sorensen currently inclines towards the special purpose machine camp.

“I see quantum as an always-somewhat-esoteric branch of computing and I’d liken it to, would you buy a Cray to check your email? Because of competing price performance issues you will always go towards the system that does the job at the best level of price-performance and there are huge quantities of things that quantum computing will never do that will justify that kind of price performance model. You know, simple transaction processing. Making sure that when you order a sweater from Amazon it gets shipped the next day. I don’t see quantum becoming the end-all-be-all when it comes to the computational platform of say 2030 or 2040. At least for the next 20 or 30 years I think it is going to be…I don’t want to say a niche technology, but perhaps a tactical technology, one that has a very specific set of use cases,” said Sorensen.

Those uses cases, however, may turn out to be disproportionately pivotal. Solving the nitrogen fixation problem could be game-changing in fighting world hunger. VW is already experimenting with D-Wave’s adiabatic annealing quantum computer – an admittedly special purpose machine – for understanding traffic management patterns. A fair amount of early global jockeying for the lead in quantum computing has emerged.

Sorensen, a long-time technology analyst in the U.S. government before joining Hyperion, said, “Technology development at the pointy end of the spear can move in fits and starts where country X or Y can have one- or two-year lead. The implications of those technology leads can take longer to play out and require a myriad of other factors to come into play before it can realize either a geopolitical or economic or military sense. I am not deeply concerned about the impact about say a China or perhaps even a pariah nation pulling ahead in quantum computing.”

The U.S., Europe, and China are steaming ahead with vigorous quantum programs. Less is known about what some other rival geopolitical regions are doing in QC.

“Europe and China are putting a lot of interest in this, and the U.S has a long history. To me there’s the issue the balance between centralized government programs and market-based activity. You’ve got an $11 billion quantum project in China right now where they are literally putting all of their eggs in one large facility to develop QC. Then you’ve got the U.S. model in which there certainly is a government element but also much more of an entrepreneurial spirit. You can’t argue that companies like Google and IBM and Rigetti and Microsoft don’t have their own innovative capabilities here that are independent of what is going at U.S. government sites. So there’s a certain vitality there,” said Sorensen.

“The EU has large government programs, which may not have the full breadth of commercial development activities [the U.S. has], but I think they view this as one of the most level playing fields in advanced technologies that they have seen in a long time and that their history and current capabilities stack up with anybody else in the world. I think they want to build on that.”

Clearly much remains unclear about quantum computing’s future. That’s probably good news for Hyperion and other analysts – the murky waters need some clearing. Sorensen has high hope for the growth of Hyperion’s quantum expertise. Moreover, just as the HPC User Forum has provided a platform for stimulating discussion and influencing HPC policy, Sorensen thinks Hyperion can play a role in QC.

“One of the things I’m hoping we can at least play a role in is the idea of thinking about quantum computing benchmarks. Right now, if you read the popular press, and I say ‘IBM’ and the first thing you think of is, yes they have a 50-qubit system. That doesn’t mean much to anybody other than it’s one more qubit than a 49-qubit system. What I am thinking about is asking these people how can we start to characterize across a number of different abstractions and implementations to gain a sense of how we can measure progress,” said Sorensen.

“What is a benchmark that says it’s progressing the state of quantum computing forward based on these [agreed upon] kinds of performance parameters or metrics. As I’ve said to people in the past, I don’t want to end up with a Q-impact where we have a one-size-fits-all benchmark that forces us in some sense or strongly encourages development in a direction that may not be the best and only relies on something the way some people say LINPACK does.”

Grand goals aside near-term progress on many specific fronts, error correction and algorithm development, for example, are needed said Sorensen. He worries quantum computing may fall victim to the “trough of despair” as AI once did.

“There are pitfalls ahead in the next few years. If quantum computing is not done right we could see the trough of despair lead to the AI nuclear winter that we saw in 80s and 90s when there was a huge promise for AI and everyone thought it was going to save the world and it got difficult. It was overpromised and it disappeared for almost 20 years. I know that because I am looking at all of my graduate school text books on AI and many of them mention convolutional neural networks. I’m worried that quantum computing may be oversold from the investment perspective. We want to scope out the landscape instead of just coming at it and saying here we are. We want a good understanding of that and the first step,” said Sorensen.

]]>ALBUQUERQUE, March 8, 2018 – Aquila today announced delivery and installation of the first ever fixed cold plate liquid cooled Aquarius HPC system. Sandia National Laboratories has deployed the system at the National Renewable Energy Laboratory (NREL) in order to fully study the benefits of Aquila’s fixed cold plate warm water cooling technology.

This patented technology, licensed and designed in conjunction with Clustered Systems, promises to nearly meet immersion technology’s cooling efficiency, while taking advantage of OCP-inspired 12VDC power conversion efficiencies. Other benefits of Aquarius’ HPC rack technology include: extreme density without trade-offs, enhanced reliability and robustness, near silent operation, unparalleled ease-of-use, ease of service, and the industry’s best ROI and TCO.

Aquarius racks are designed for long-term reliability and re-use, allowing customers to launch their next-gen servers with only minimal re-engineering. The racks feature a forward-looking OCPV2 form factor, and are capable of attaching directly to existing data center facility water infrastructure in conjunction with a rack mount Cooling Distribution Unit (CDU) from Aquila’s preferred CDU supplier, Motivair.

The Aquarius design is an innovative departure from other non-immersion liquid cooled solutions that employ DLC CPU heat sink technologies. The Aquarius manifold and fixed cold plate design eliminate the possibility of leakage during servicing, as there are no plastic tubes or quick disconnects anywhere in the system. The fixed cold plate and robust manifold design uses only electrochemically compatible metals, eliminating the potential for water contamination due to corrosion.

“Sandia maintains a constant drive to reduce energy use in HPC and make our centers as energy efficient as possible,” said David Martinez, Engineering Program Project Lead for Infrastructure Computing Services at Sandia National Laboratories. “Sandia address the problem from a systems process viewpoint. Liquid cooling systems extract heat directly from server board components and can prevent speed throttling due to overheating. Our New Mexico climate permits use of non-mechanical cooling, which, when combined with warm water inlet temperatures, saves considerable energy. We can now capture energy from the elevated return water to support indirect energy needs, such as process water, domestic hot water, and absorption cooling processes. Aquila’s fixed cold plate liquid-cooled HPC system lowers the cost of cooling. The Sandia-named “Yacumama” cluster we’ve deployed in Colorado will be returned to service at Sandia after rigorous testing at NREL’s water cooled HPC center.”

“Compute power requirements for HPC, mobility, OpenStack, and SDN data center applications continue to escalate the need for solutions that address density. And data centers cooled with air have fundamental restrictions on power density,“ explains Phil Hughes, CEO of Clustered Systems. “Sandia and NREL get it. Warm water liquid cooling has none of these restrictions and can be packed very densely, without a need for a specialized forced-air driven building.”

“We are excited to be working with NREL and Sandia National Laboratories, as these DOE labs provide a leadership position in innovative liquid cooling and other cutting-edge efficiency-driven data center technologies. We feel their leadership will help shape the future of HPC and influence the modern data center designer towards adoption of liquid cooling. Our shared vision holds the promise of improving data center energy efficiency by as much as 50%,” said Judy Beckes Talcott, President of Aquila.

A nearly 50% energy savings on the overall server power envelope, allowing for the additional cost of the rack to be recovered within the first year of power savings.

A near zero failure rate due to the elimination of fan vibration and enhanced thermal stability, which reduces sudden expansion/contraction component stress.

Cooling for all of the server’s major heat sources, not just the CPU (as with most providers of DLC heat removal technologies). Aquarius is capable of cooling over 75kW per standard rack footprint.

A 5x increase in server density, housing up to two 2.5” HDDs or SSDs at each node, and equally supporting fully virtualized storage and hybridized storage approaches.

Savings of up to 30% of a server’s power budget, due to the elimination of all server fans. This normally stranded energy can be used to power more servers, further adding to overall energy efficiency

About Aquila

Headquartered in Albuquerque, NM, Aquila is a 35-year-old employee-owned New Mexico corporation with an ISO 9001-2015 certified manufacturing practice. Aquila is a provider of networking hardware and network security solutions, and is an award-winning manufacturer and technology transfer partner of Sandia National Laboratories. In 2015 Aquila and Clustered Systems teamed to co-develop Aquarius, a third-generation TouchCooling rack platform geared towards high density HPC and dense data center computing applications. For more information visit: www.aquilagroup.com/aquarius.

About Clustered Systems

Headquartered in Silicon Valley, CA, Clustered Systems Company was founded to solve infrastructure problems associated with deploying large HPC systems. Their unique pumped refrigerant Touch Cooling technology was developed in 2008 with the support of the California Energy Commission, which proved to be the most energy efficient cooling technology available at that time. Their second-generation HPC blade system, the ExaBlade, demonstrated several significant breakthroughs in energy efficiency, reliability, ease-of-use, and rack density during an 18-month trial period at SLAC. For more visit: www.clusteredsystems.com.

]]>https://www.hpcwire.com/off-the-wire/nrel-to-study-benefits-of-innovative-aquarius-liquid-cooled-technology-for-sandia/feed/048486University of Toronto to Push Limits of New Supercomputer with Heroic Ocean Modeling Problemhttps://www.hpcwire.com/off-the-wire/university-of-toronto-to-test-limits-of-new-supercomputer-with-ocean-modeling/?utm_source=rss&utm_medium=rss&utm_campaign=university-of-toronto-to-test-limits-of-new-supercomputer-with-ocean-modeling
https://www.hpcwire.com/off-the-wire/university-of-toronto-to-test-limits-of-new-supercomputer-with-ocean-modeling/#respondTue, 06 Mar 2018 04:00:26 +0000https://www.hpcwire.com/?post_type=off-the-wire&p=48491March 5 — To break in Canada’s newest, most powerful research supercomputer, the University of Toronto’s Richard Peltier is running a “heroic calculation” – one that is expected to shed new light on how the world’s oceans physically function. It’s unknown how long it will take the more than $18-million machine known as Niagara to crunch the […]

]]>March 5 — To break in Canada’s newest, most powerful research supercomputer, the University of Toronto’s Richard Peltier is running a “heroic calculation” – one that is expected to shed new light on how the world’s oceans physically function.

It’s unknown how long it will take the more than $18-million machine known as Niagara to crunch the millions of gigabytes in real-time data streaming to it now from the ocean bottom of the Pacific.

“It’s never been done before,” said Peltier, a globally renowned climate change expert. “It could be days or even a week depending on the spatial resolution we decide to work at.”

Unveiled today, Niagara is a massive network of 60,000 cores – the equivalent of roughly 60,000 powerful desktop computers – that can be tasked to work together simultaneously on a single, humungous problem.

This type of setup, known as a large parallel system, is the only one in Canada and is housed in a secure, nondescript location in Vaughan, Ont. It’s open to all Canadian university researchers and is part of a national network of research computing infrastructure.

Up and running for a week, the SciNet team has started feeding the data for Peltier into Niagara. He came up with the idea of running a heroic calculation on it after discussing with colleagues how best to strenuously test the power of the large parallel system.

“By devoting the entire machine, not only a portion of it, to this one calculation – that’s why it’s ‘heroic,’” said Peltier, a U of T University Professor of physics and scientific director of SciNet. “This is pure, curiosity-driven research. We hope the results will warrant publication and be a major coup for Niagara.”

Running a similar calculation on the old SciNet supercomputer would have taken roughly 20 times longer.

The calculation, done in partnership with researchers at the University of Michigan and the Jet Propulsion Lab at Caltech, is attempting to answer a fundamental research question that holds great interest for researchers in a number of fields.

In the 1970s, oceanographers Chris Garrett and Walter Munk famously theorized the world’s oceans are filled with internal waves ricocheting back and forth from the ocean bottom to the surface and predicted the shape of the power spectrum that should be observed in these waves.

The waves are generated by the barotropic tide causing the water in the oceans to slosh back and forth horizontally in response to the gravitational pull of the sun and moon. Their intensity is magnified by bumps along the ocean floor – the bumpier the bottom, the stronger the wave, Peltier explained. When waves break, turbulence is generated and causes friction, which makes the ocean dissipative and “sticky.”

But for more than four decades, scientists have lacked an accurate, high resolution model of the detailed physics of this interaction to actually see whether the theoretical arguments are correct, he said.

To conduct the Niagara calculation, his team of collaborators are using data from ocean sensors called McLane profilers in selected patches of the Pacific Ocean – one near the Hawaiian islands, which has a very bumpy ocean bottom, and one in the open ocean of the central-west Pacific, which has a smoother ocean floor.

This information will then be coupled with atmospheric data to model the formation, intensity and life span of these waves as they dissipate over time.

“I’d like to think that we’ll be able to verify at very high spatial resolution the internal wave spectrum,” said Peltier. “Hopefully we’ll be able to shout, ‘Eureka, we’ve not only seen wiggles, we’ve seen wiggles of the right set of [wave] phase speeds that the ocean should be filled with” – as predicted by Garrett and Munk.

This calculation will “assure us that when we do put an ocean model to work in the context of a global warming calculation, for example, that we can feel secure that the physical process is properly represented,” he added.

University of Michigan oceanographer Brian Arbic said understanding the actions of internal waves more fully will also have a profound impact on the study of ocean temperatures, salinity, circulation and marine biology, which are “crucial for Earth’s climate, marine resources and uptake of carbon and heat by the Earth’s oceans.”

“This is a first for our community and implies that we have the potential for modelling internal gravity waves more realistically than ever before,” he said.

“The U of T supercomputer is extremely important to this work. It is a very large and cutting-edge machine. We would not be able to do this calculation right now without access to it.”

For Peltier, breaking internal waves caused by flow over bumps – whether mountain tops on the surface of the continent or on the ocean floor at great depth beneath the ocean surface – has been an area of intense interest for him since the beginning of his career.

He’s already planning how he’ll apply Niagara’s heroic calculation results to Ice Age conditions, when the level of water in the oceans was much lower and waves broke further offshore away from the continental slopes.

“The spectrum of waves in the oceans and the dissipation of waves should be dramatically different,” he said. “I’m expecting the stickiness of the ocean will change dramatically.”

“There isn’t a single field of research that can happen without high-performance computing,” said SciNet’s CTO Daniel Gruner in a video presentation. “All Canadians will benefit from Niagara because basic research, fundamental research, is so important to the development of what we now call applied research. Without the basic research nothing can happen.”

Each of the system’s 1,500 nodes comes outfitted with two 20-core Intel Xeon Gold 6148 (2.4GHz) CPUs and 200 GB (188 GiB) of DDR4 memory. Filling up 21 racks, nodes are interconnected using Mellanox InfiniBand in a DragonFly+ topology and supported by more than 12 petabytes of Lenovo DSS-G high performance storage, based on IBM’s SpectrumScale file system. There is also a 256TB burst buffer (NVMe fabric, up to 160 GB/s) provided by Excelero for fast I/O.

Installation commenced at the end of November and acceptance testing was completed at the end of February. To break in Canada’s powerful research supercomputer, University of Toronto’s renowned climate scientist Richard Peltier is running a heroic ocean modeling calculation across the entirety of the machine. The work, already underway, is expected to lead to a greater understanding of the impact of ocean floor topologies on wave physics. The job is expected to take between one day and one week depending on resolution parameters; running an equivalent workload on the previous SciNet infrastructure would have taken roughly 20 times longer, according to the research team.

Niagara is replacing SciNet’s aging General Purpose Cluster (GPC) and Tightly Coupled Cluster (TCS), which have been providing Canada’s largest supercomputing center with the bulk of its compute cycles since 2009. TCS was SciNet’s first supercomputer; the 102-node IBM Power6 system was decommissioned and removed last year to make room for Niagara. GPC, SciNet’s 3,780-node IBM iDataPlex cluster, has been reduced by half and will be fully decommissioned when Niagara enters full production, which is expected to take place next month.

Niagara is one of four new systems being deployed to Compute Canada host sites. The others are Arbutus, a Lenovo-built OpenStack cloud system being deployed at the University of Victoria; Cedar, a 3.7-petaflop (peak) Dell system installed at Simon Fraser University, and Graham, a 2.6-petaflop (peak) Huawei-made cluster, located at the University of Waterloo.

With a peak theoretical speed of 4.61 petaflops and a Linpack Rmax of 3 petaflops, Niagara ranks among the top 10 percent of fastest publicly benchmarked computers in the world today. System maker Lenovo has made both HPC and supercomputing a priority since it acquired IBM’s x86 business for $2.1 billion in 2014 and its ambitions aren’t stopping. It’s deployed the fastest systems in Spain, Italy, Denmark, Norway, Australia, Canada, and soon in Germany with Leibniz Supercomputing Center, and it counts 87 Top500 systems (inclusive of all Top500 Lenovo classifications, i.e., Lenovo (81), Lenovo/IBM (3), IBM/Lenovo (2) and this collaboration). This puts it in second place by system share, behind HPE, and in third place after Cray and HPE based on aggregate list performance.

These system wins are part of an aggressive strategy that Lenovo is pursuing to become number one in HPC and supercomputing. “We want to be not just the fastest growing, which we are today both in HPC and Top500, but we want to be the largest by 2020,” Kirk Skaugen, executive vice president and president of the Data Center Group, told HPCwire in an interview at Supercomputing in Denver.

Lenovo’s third-quarter datacenter business exceeded Wall Street’s expectations, coming in at $1.2 billion in revenue, the highest in two years. This is 17 percent over the same period last year, and a significant increase of 26 percent over the previous quarter.

Speakers at this morning’s launch included (from left to right): Nizar Ladak, President and CEO, Compute Ontario; Dr. Richard Peltier, Earth, Atmospheric and Planetary Physicist, University of Toronto; Minister Reza Moridi, Ministry of Research Innovation and Science; Dr. Roseann O’Reilly Runte, President and CEO, Canada Foundation for Innovation; and Dr. Vivek Goel, Vice President, Research and Innovation, University of Toronto.

]]>https://www.hpcwire.com/2018/03/05/scinet-launches-niagara-canadas-fastest-supercomputer/feed/048275Sandia-Developed Benchmark Re-ranks Top Computershttps://www.hpcwire.com/off-the-wire/sandia-developed-benchmark-re-ranks-top-computers/?utm_source=rss&utm_medium=rss&utm_campaign=sandia-developed-benchmark-re-ranks-top-computers
https://www.hpcwire.com/off-the-wire/sandia-developed-benchmark-re-ranks-top-computers/#respondWed, 28 Feb 2018 00:53:50 +0000https://www.hpcwire.com/?post_type=off-the-wire&p=47883ALBUQUERQUE, N.M., Feb. 27 — A Sandia National Laboratories software program now installed as an additional test for the TOP500 supercomputer challenge has become increasingly prominent. The program’s full name — High Performance Conjugate Gradients, or HPCG — doesn’t come trippingly to the tongue, but word is seeping out that this relatively new benchmarking program is becoming […]

]]>ALBUQUERQUE, N.M., Feb. 27 — A Sandia National Laboratories software program now installed as an additional test for the TOP500 supercomputer challenge has become increasingly prominent. The program’s full name — High Performance Conjugate Gradients, or HPCG — doesn’t come trippingly to the tongue, but word is seeping out that this relatively new benchmarking program is becoming as valuable as its venerable partner — the High Performance LINPACK program — which some say has become less than satisfactory in measuring many of today’s computational challenges.
TOP500 LINPACK and HPCG charts of the fastest supercomputers of 2017. The rearranged order and drastic reduction in estimated speed for the HPCG benchmarks are the result of a different method of testing modern supercomputer programs. (Image courtesy of Sandia National Laboratories)

“The LINPACK program used to represent a broad spectrum of the core computations that needed to be performed, but things have changed,” said Sandia researcher Mike Heroux, who created and developed the HPCG program. “The LINPACK program performs compute-rich algorithms on dense data structures to identify the theoretical maximum speed of a supercomputer. Today’s applications often use sparse data structures, and computations are leaner.”

The term “sparse” means that a matrix under consideration has mostly zero values. “The world is really sparse at large sizes,” said Heroux. “Think about your social media connections: there may be millions of people represented in a matrix, but your row — the people who influence you — are few. So, the effective matrix is sparse. Do other people on the planet still influence you? Yes, but through people close to you.”

Similarly, for a scientific problem whose solution requires billions of equations, most of the matrix coefficients are zero. For example, when measuring pressure differentials in a 3-D mesh, the pressure on each node is directly dependent on its neighbors’ pressures. The pressure in faraway places is represented through the node’s near neighbors. “The cost of storing all matrix terms, as the LINPACK program does, becomes prohibitive, and the computational cost even more so,” said Heroux. A computer may be very fast in computing with dense matrices, and thus score highly on the LINPACK test, but in practical terms the HPCG test is more realistic.

To better reflect the practical elements of current supercomputing application programs, Heroux developed HPCG’s preconditioned iterative method for solving systems containing billions of linear equations and billions of unknowns. “Iterative” means the program starts with an initial guess to the solution, and then computes a sequence of improved answers. Preconditioning uses other properties of the problem to quickly converge to an acceptably close answer.

“To solve the problems we need to for our mission, which might range from a full weapons simulation to a wind farm, we need to describe physical phenomena to high fidelity, such as the pressure differential of a fluid flow simulation,” said Heroux. “For a mesh in a 3-D domain, you need to know at each node on the grid the relations to values at all the other nodes. A preconditioner makes the iterative method converge more quickly, so a multigrid preconditioner is applied to the method at each iteration.”

Supercomputer vendors like NVIDIA Corp., Fujitsu Ltd., IBM, Intel Corp. and Chinese companies write versions of HPCG’s program that are optimal for their platform. While it might seem odd for students to modify a test to suit themselves, it’s clearly desirable for supercomputers of various designs to personalize the test, as long as each competitor touches all the agreed-upon calculation bases.

“We have checks in the code to detect optimizations that are not permitted under published benchmark policy,” said Heroux.

On the HPCG TOP500 list, the Sandia and Los Alamos National Laboratory supercomputer Trinity has risen to No. 3, and is the top Department of Energy system. Trinity is No. 7 overall in the LINPACK ranking. HPCG better reflects the Trinity design choices.

Sandia National Laboratories computational researcher Mike Heroux created the HPCG program that re-arranges supercomputer rankings. (Photo courtesy of Sandia National Laboratories)

Heroux says he wrote the base HPCG code 15 years ago, originally as a teaching code for students and colleagues who wanted to learn the anatomy of an application that uses scalable sparse solvers. Jack Dongarra and Piotr Luszczek of the University of Tennessee have been essential collaborators on the HPCG project. In particular, Dongarra, whose visibility in the high-performance computing community is unrivaled, has been a strong promoter of HPCG.

“His promotional contributions are essential,” said Heroux. “People respect Jack’s knowledge and it helped immensely in spreading the word. But if the program wasn’t solid, promotion alone wouldn’t be enough.”

Heroux invested his time in developing HPCG because he had a strong desire to better assure the U.S. stockpile’s safety and effectiveness. The supercomputing community needed a new benchmark that better reflected the needs of the national security scientific computing community.

“I had worked at Cray Inc. for 10 years before joining Sandia in ’98,” he says, “when I saw the algorithmic work I cared about moving to the labs for the Accelerated Strategic Computing Initiative (ASCI). When the US decided to observe the Comprehensive Nuclear Test Ban Treaty, we needed high-end computing to better ensure the nuclear stockpile’s safety and effectiveness. I thought it was a noble thing, that I would be happy to be part of it, and that my expertise could be applied to develop next-generation simulation capabilities. ASCI was the big new project in the late 1990s if I wanted to do something meaningful in my area of research and development.”

Heroux is now director of software technology for the Department of Energy’s Exascale Computing Project. There, he works to harmonize the computing work of the DOE national labs — Oak Ridge, Argonne, Lawrence Berkeley, Pacific Northwest, Brookhaven and Fermi, along with the three National Nuclear Security Administration labs.

“Today, we have an opportunity to create an integrated effort among the national labs,” said Heroux. “We now have daily forums at the project level, and the people I work with most closely are people from the other labs. Because the Exascale Computing Project is integrated, we have to deliver software to the applications and the hardware at all labs. The Department of Energy’s attempt at a multi-lab, multi-university project gives an organizational structure for us to work together as a cohesive unit so that software is delivered to fit the key applications.”

Among Heroux’s achievements, he served for six years as editor-in-chief of ACM’s Transactions on Mathematical Software. He is a senior scientist at Sandia.

About Sandia National Laboratories

Sandia National Laboratories is a multimission laboratory operated by National Technology and Engineering Solutions of Sandia LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration. Sandia Labs has major research and development responsibilities in nuclear deterrence, global security, defense, energy technologies and economic competitiveness, with main facilities in Albuquerque, New Mexico, and Livermore, California.

]]>https://www.hpcwire.com/off-the-wire/sandia-developed-benchmark-re-ranks-top-computers/feed/047883And So It Begins…Again – The FY19 Exascale Budget Rollout (and things look good)https://www.hpcwire.com/2018/02/23/begins-fy19-exascale-budget-rollout-things-look-good/?utm_source=rss&utm_medium=rss&utm_campaign=begins-fy19-exascale-budget-rollout-things-look-good
https://www.hpcwire.com/2018/02/23/begins-fy19-exascale-budget-rollout-things-look-good/#respondFri, 23 Feb 2018 22:35:43 +0000https://www.hpcwire.com/?p=47752On February 12, 2018, the Trump administration submitted its Fiscal Year 2019 (FY-19) budget to Congress. The good news for the U.S. exascale program is that the numbers look very good and the support appears to be strong. There is also the interesting addition of a funding request to support research in quantum computing. One […]

]]>On February 12, 2018, the Trump administration submitted its Fiscal Year 2019 (FY-19) budget to Congress. The good news for the U.S. exascale program is that the numbers look very good and the support appears to be strong. There is also the interesting addition of a funding request to support research in quantum computing. One of the challenges with the FY-19 budget request is that it is being made before the FY-18 budget has been fully resolved. The good news is that it looks like all the pieces are in place to make that happen no later than March.

The proposed President’s FY-19 federal budget request identifies a total of $636M in the Department of Energy’s (DOE) budget for exascale activities. This is $376M above the FY-17 enacted level. Those funds are spread across two parts of the DOE, that in combination, make up the Exascale Computing Initiative (ECI). One element of the funding for exascale comes from the National Nuclear Security Adminstration (NNSA) Advanced Simulation and Computing (ASC) program with a request of $163M. The other part of the exascale computing request is for the Office of Science (SC) Advanced Scientific Computing Research (ASCR) program at $473M.

At this point, the only FY-19 budget documentation available on the DOE web site is the Budget Fact Sheet and the Budget in Brief, so details are limited. However, there are a few interesting facts. Starting at the high level, the NNSA ASC program request goes up by $40M to $703M from an annualized FY-18 Continuing Resolution (CR) of $659M. For the SC ASCR program, their requested funding goes up by a whopping $257M to a total of $899M over their annualized FY-18 CR funding of $642M. This number is impressive given that many other parts of the Office of Science saw significant reductions.

The introductory sections of the Budget in Brief document provides some interesting clues about the procurement and installation of the U.S. exascale computing systems. On page one, in the section labeled Accelerating Progress On National Priorities, there is a statement that the requested funds will be used at Argonne and Oak Ridge National Laboratories for R&D and design of exascale-capable systems. The section goes on to say that the first exascale system will be being deployed in 2021 and the second system, with a different architecture, deployed in 2022. This aligns with the previously announced Aurora system built by Intel at Argonne and the yet to be competed Frontier system at Oak Ridge. That procurement will use a similar process to CORAL and is expected to begin in early 2018,

Page two of the introduction provides more details about the NNSA element of the ECI. It states that of out of the $163M NNSA request for ECI that $24M goes towards the the Exascale Class Computer Cooling Equipment (EC3E) project at the Los Alamos National Laboratory (LANL), and $23M for the Exascale Computing Facility Modernization (ECFM) project at the Lawrence Livermore National Laboratory (LLNL). In the detailed NNSA budget breakdown, we learned that in FY-17 the first ASC exascale system passed its first mission needs Critical Decision point (CD-0) and is expected to be delivered in 2023. The system is code named “El Capitan” and will be installed at LLNL. This system will also be part of the follow-on CORAL procurement that will start in early 2018.

Another interesting statement in the Budget in Brief introduction is about DOE quantum computing activities. The statement is a paragraph about the $105M request to support “the emerging urgency of building U.S. competency and competitiveness in the developing area of quantum information science, including quantum computing and quantum sensor technology.” The Office of Science will administer these funds and both the ASCR program and the High Energy Physics (HEP) program budget descriptions mention something about quantum computing.

In summary – the FY-19 budget request looks very encouraging for the U.S. exascale program. The request is just the beginning of the budgetary process, but a strong start is always very important. Also, other developments in Congress with plans to resolve the FY-18 budget situation seem to indicate that Congress should be able to support (and perhaps improve upon) the President’s request numbers.

When we last left things with the FY-18 budget, the federal government and DOE was under a Continuing Resolution (CR) and they still are today. For ECI, that limited funding the lower of the levels provided by the House or Senate. For the ASC program, that was not a problem because those levels were $161M and were the same for both the House and Senate marks. The situation was a bit more complicated for the ASCR program because of their divergent marks. The House provided a total of $282M ($170M for ECP and $112M for facilities) and a total $434M in the Senate ($184M for ECP and $250M for facilities). The CR requires ASCR to spend at the lower House level, but to plan for and be ready to use the funds at the higher Senate level.

The passage of the House and Senate their individual appropriations bills towards the end of the fiscal year was encouraging that they could resolve their differences (known as a conference committee) and pass the bill, and thus complete the “regular order” process. Unfortunately, other issues outside of the budget process got in the way and that did not happen. As a result, the first CR was passed to run until December 8th, then the second one until December 22nd, then a third until January 19th, and a fourth that expired on February 8th. The federal government is currently under its fifth CR and that will expire on March 23rd (or almost six months after the start of the fiscal year). The good news is that the last CR included several important agreements that provide some hope that we may see actual signed FY-18 appropriation laws.

One of those agreements is to provide relief from the budget caps established by the Budget Control Act of 2011. Those limits (also known as sequestration) were applied (somewhat equally) to both the defense and discretionary elements of the federal budgets. The sequestration limits were causing major challenges for the Congressional appropriators as they distributed those limits on budget growth. With those limits eased, the appropriators will have a much easier time coming to an agreement. Another important element of the last CR was the decision to put off any discussion of the limit on the national debt until March of 2019. The debt limit puts a cap on how much more money the country can borrow the fund its operations. The U.S. was expected to reach its debt limit by early in 2018, which would have cause major problems with the FY-18 appropriations. Putting off that problem should make it easier to pass FY-18 appropriation. Finally, a third element of the latest CR was that it included the top-line numbers for the FY-19 appropriations. These are normally set in a Budget Resolution and are important to provide funding guidelines for the appropriations subcommittees.

Sorry for the budget “wonk” stuff, but the bottom line is that by March 23rd, the representatives of the 12 House and Senate appropriations subcommittees hopefully will have come together to resolve the differences between their bills. This will be much easier with the additional funding flexibility provided by the last CR. Once the differences are resolved, each of the appropriations bills will then have to be passed by the House and Senate and then set to the President for his signature.

Things continue to look good for the U.S. exascale program. The program has survived the transition of Presidential administrations and is getting strong support by the new Secretary of Energy and the Director of the Office of Management and Budget. The program has previously received support by the Senate and House appropriators, but there were challenges to balancing that support against other funding priorities. The flexibility provided by the latest CR should lead to a successful Senate and House conference on the appropriations bill and full Congressional passage and then Presidential signature. That will mean the return of “regular order” — although taking a bit more time than expected. As the saying goes . . . “Better late than never!”

About the Author

Alex Larzelere is a senior fellow at the U.S. Council on Competitiveness, the president of Larzelere & Associates Consulting and HPCwire’s policy editor. He is currently a technologist, speaker and author on a number of disruptive technologies that include: advanced modeling and simulation; high performance computing; artificial intelligence; the Internet of Things; and additive manufacturing. Alex’s career has included time in federal service (working closely with DOE national labs), private industry, and as founder of a small business. Throughout that time, he led programs that implemented the use of cutting edge advanced computing technologies to enable high resolution, multi-physics simulations of complex physical systems. Alex is the author of “Delivering Insight: The History of the Accelerated Strategic Computing Initiative (ASCI).”

]]>https://www.hpcwire.com/2018/02/23/begins-fy19-exascale-budget-rollout-things-look-good/feed/047752OCF Deploys Petascale Lenovo Supercomputer at University of Southamptonhttps://www.hpcwire.com/off-the-wire/ocf-deploys-petascale-lenovo-supercomputer-university-southampton/?utm_source=rss&utm_medium=rss&utm_campaign=ocf-deploys-petascale-lenovo-supercomputer-university-southampton
https://www.hpcwire.com/off-the-wire/ocf-deploys-petascale-lenovo-supercomputer-university-southampton/#respondFri, 23 Feb 2018 07:03:22 +0000https://www.hpcwire.com/?post_type=off-the-wire&p=47711Feb. 22 — Researchers from across the University of Southampton are benefitting from a new high performance computing (HPC) machine named Iridis, which has entered the Top500, debuting at 251 on the list. The new 1,300 teraflops system was designed, integrated and configured by high performance compute, storage and data analytics integrator, OCF, and will support […]

]]>Feb. 22 — Researchers from across the University of Southampton are benefitting from a new high performance computing (HPC) machine named Iridis, which has entered the Top500, debuting at 251 on the list. The new 1,300 teraflops system was designed, integrated and configured by high performance compute, storage and data analytics integrator, OCF, and will support research demanding traditional HPC as well as projects requiring large scale deep storage, big data analytics, web platforms for bioinformatics, and AI services.

Over the past decade, the University has seen a 425 per cent increase in the number of research projects using HPC services, from across multiple disciplines such as engineering, chemistry, physics, medicine and computer science. In addition, the new HPC system is also supporting the University’s Wolfson Unit. Best known for ship model testing, sailing yacht performance and ship design software, the Unit was founded in 1967 to enable industry to benefit from the facilities, academic excellence and research activities at the University of Southampton.

“We have a worldwide customer base and have worked with the British Cycling Team for the last three Olympic games, as well as working with teams involved in the America’s Cup yacht race,” comments Sandy Wright, Principal Research Engineer, Wolfson Unit at the University of Southampton. “In the past 10 years, Computational Fluid Dynamics (CFD) has become a perfectly valid commercial activity, reducing the need for physical experimentation. CFD gives as good an answer as the wind tunnel, without the need to build models, so you can speed up research whilst reducing costs. Iridis 5 will enable the Wolfson Unit to get more accurate results, whilst looking at more parameters and asking more questions of computational models.”

It’s a sentiment echoed by Syma Khalid, Professor of Computational Biophysics at the University: “Our research focuses on understanding how biological membranes function – we use HPC to develop models to predict how membranes protect bacteria. These membranes control how molecules move in and out of bacteria. We aim to understand how they do this at the molecular level. The new insights we gain from our HPC studies have the potential to inform the development of novel antibiotics. We’ve had early access to Iridis 5 and it’s substantially bigger and faster than its previous iteration – it’s well ahead of any other in use at any University across the UK for the types of calculations we’re doing.

Four times more powerful than the University’s previous HPC system, Iridis comprises more than 20,000 Intel Skylake cores on a next generation Lenovo ThinkSystem SD530 server – the first UK installation of the hardware. In addition, it is using 10x Gigabyte servers in total containing 40x NVIDIA GTX 1080 Ti GPUs for projects requiring high single precision performance, and OCF has committed to the delivery of 20x Volta GPUs when they become available. OCF’s xCAT-based software is used to manage the main HPC resources, with Bright Computing’s Advanced Linux Cluster Management software chosen to provide the research cloud and data analytics portions of the system.

“We’re purposefully embracing more researchers and disciplines than ever before at the University, which brings a lot of competing demands, so we need a more agile way to provision systems,” says Oz Parchment, Director of iSolutions, at the University of Southampton.” Users need a infrastructure that’s flexible and easily managed, which is why Bright Computing is the ideal solution, particularly as we’re now embracing more complex research disciplines.”

Iridis has two petabytes of storage provided by Lenovo DSS Spectrum Scale Appliance connected via Mellanox EDR Infiniband and more than five PetaBytes of research data management storage using IBM Tape, taking advantage of the latest 15TB capable drives and media. The University has committed to a Proof of Concept of StrongBox’s StrongLink data and tape management solution, a unique approach to manging data environments that automates data classification for life-cycle data management for any data, on any storage, anywhere.

Oz continues: “The University of Southampton has a long tradition in the use of computational techniques to generate new knowledge and insight, stretching back to 1959 when our researchers first used modelling techniques on the design of Sydney Opera House. Data and analysis of that data, using computational methods is at the heart of modern science and technology and, in order to attract the best world-class researchers we need world-class research facilities.”

Julian Fielden, Managing Director of OCF comments: “Academia really is feeling the pressure in attracting new researchers, groups and grants. Competition has never been fiercer. Throughout our 13-year relationship with the University of Southampton, it has had the determination and ambition to compete not just nationally, but internationally and, critically, provide the HPC, Cloud and Data Analytics services that world-class researchers desire.

On working with OCF, Parchment concludes: “We’ve been working with OCF since 2004. The team has always delivered to our needs and gone the extra mile providing services, support and consultancy in addition to the hardware and software solutions. OCF listens and understands our needs, putting forward ideas that we haven’t even thought about. The team are all technical innovators.”

About the University of Southampton

The University of Southampton drives original thinking, turns knowledge into action and impact, and creates solutions to the world’s challenges. We are among the top one per cent of institutions globally. Our academics are leaders in their fields, forging links with high-profile international businesses and organisations, and inspiring a 24,000-strong community of exceptional students, from over 135 countries worldwide. Through our high-quality education, the University helps students on a journey of discovery to realise their potential and join our global network of over 200,000 alumni. www.southampton.ac.uk

About OCF

OCF specialises in supporting the significant big data challenges of private and public UK organisations. Our in-house team and extensive partner network can design, integrate, manage or host the high performance compute, storage hardware and analytics software necessary for customers to extract value from their data. With heritage of over 15 years in HPC, managing big data challenges, OCF now works with over 20 per cent of the UK’s Universities and Research Councils, as well as commercial clients from the automotive, aerospace, financial, manufacturing, media, oil & gas, pharmaceutical and utilities industries. www.ocf.co.uk

]]>https://www.hpcwire.com/off-the-wire/ocf-deploys-petascale-lenovo-supercomputer-university-southampton/feed/047711Lenovo Unveils Warm Water Cooled ThinkSystem SD650 in Rampup to LRZ Installhttps://www.hpcwire.com/2018/02/22/lenovo-unveils-warm-water-cooled-thinksystem-sd650-rampup-lrz-install/?utm_source=rss&utm_medium=rss&utm_campaign=lenovo-unveils-warm-water-cooled-thinksystem-sd650-rampup-lrz-install
https://www.hpcwire.com/2018/02/22/lenovo-unveils-warm-water-cooled-thinksystem-sd650-rampup-lrz-install/#respondFri, 23 Feb 2018 05:19:30 +0000https://www.hpcwire.com/?p=47689This week Lenovo took the wraps off the ThinkSystem SD650 high-density server with third-generation direct water cooling technology developed in tandem with partner Leibniz Supercomputing Center (LRZ) in Germany. The servers are designed to operate using warm water, up to 45°C for general deployments and for special bid projects up to 50°C, lowering datacenter power […]

]]>This week Lenovo took the wraps off the ThinkSystem SD650 high-density server with third-generation direct water cooling technology developed in tandem with partner Leibniz Supercomputing Center (LRZ) in Germany. The servers are designed to operate using warm water, up to 45°C for general deployments and for special bid projects up to 50°C, lowering datacenter power consumption 30-40 percent compared to traditional cooling methods, according to Lenovo.

Nearly 6,500 of the ThinkSystems SD650s featuring Intel Xeon Platinum (Skylake) processors interconnected with Intel Omni-Path Architecture will be put into production at LRZ this year, providing the supercomputing center with 26.7 petaflops of peak performance, housed in a little over 100 racks.

The SuperMUC-NG supercomputer will be deployed with Lenovo’s new Lenovo Intelligent Computing Orchestrator (LiCO) and the Lenovo Energy Aware Runtime (EAR) software, a technology that dynamically optimizes system infrastructure power while applications are running.

Lenovo’s Scott Tease holding a ThinkSystem SD650 server

“Pretty much all the investments that we made to get to exascale LRZ is taking advantage of in this bid we won with them,” said Scott Tease, executive director, HPC and AI at Lenovo in an on-site briefing at Lenovo’s headquarters in Morrisville, North Carolina, last week. “We will start building systems and start shipping them in March; the floor will be ready by the end of April, and move-in starts in early May. We’ll be ready to do acceptance in September with final customer acceptance in November.”

The direct-water cooled design of the SD650 enables 85-90 percent heat recovery; the rest can easily be managed by a standard computer room air conditioner. The hot water coming off the servers can be recycled to warm buildings in the winter, as LRZ does with its petascale SuperMUC cluster, but the technology developed by Lenovo for SuperMUC-NG actually transforms that heat energy back into cooling for networking and storage components.

The endothermic magic trick only works with “high quality heat,” Lenovo thermal engineer Vinod Kamath told us, so LRZ’s SD650 servers were designed to be able to consume 50°C inlet temperatures. Water is piped out of the servers at 58-60°C depending on workload and sent through an adsorption chiller, where it is converted to chilled 20°C water suitable for cooling storage and networking components.

If you’re using chilled water to cool servers you can’t really take advantage of the economics of the adsorption chiller. With 60°C inlet water, the efficiency of Lenovo’s adsorption chiller is about 60 percent. If your energy source has a higher temperature, say 80-90°C then the extraction is even more efficient, but 60°C is good enough to realize significant savings.

Adsorption chilling will be applied to half the nodes of the next-gen LRZ install, generating about 600 kilowatts of chilled water capacity. This translates into more than 100,000 Euros a year in saved energy cost at the European site, where the rate for energy is about 16-18 Eurocents per kilowatt-hour (roughly 2-3 times the cost for similar sites in the United States). Lenovo claims a 45-50 percent energy savings with the endothermic reaction versus a traditional compressor, dropping the datacenter PUE from 1.6 to less than 1.1.

click to enlarge

The cooling solution can be traced back to 2012, when IBM (Lenovo acquired IBM’s x86 server business in 2014) was approached by LRZ to develop a system that was both powerful and extremely energy efficient. The first production implementation to come out of the partnership was the 9,200 node SuperMUC at LRZ, that achieved a number four ranking on the June 2012 Top500 list. The custom motherboard, developed with Intel, was cooled by water piped over compute and memory and back out of the system. LRZ used the hot water coming out of the system to heat parts of their building, which offset some of their overall energy costs.

The partnership also led to the deployment of the CooLMUC-2 cluster at LRZ in 2016. That system was the prototype for the next-gen LRZ cooling solution; it uses hot outlet water to drive adsorption chillers that generate refrigerated water, which is then used to cool some of the cluster’s disk storage systems.

“When we started doing this it was all about power cost,” said Tease. “It was all about datacenter optimization. Those things are still important, but we’re starting to see people recognize that water will allow them to do things that air can’t. I can do special processors that I can’t do with air; I can achieve densities that in the future I can’t do with air. We are really excited that we’ve got such a unique design, what we believe is an industry-leading design point as the market is coming to where we’ve been.”

The SD650 HPC servers have no system fans (except on the power supplies at the back of the rack), and operate at lower temperatures when compared to standard air-cooled systems. Chillers are not needed for most customers, which translates into further savings and a lower total cost of ownership. The new server supports high-speed EDR InfiniBand and Omni-Path fabrics as well as standard SSDs, NVMe SSDs, and M.2 boot SSDs.

In demoing the SD650, Kamath showed how the water supply comes in through the 6U NeXtScale n1200 chassis and flows into the servers. “We have a calibrated flow split between the processor and the memory to tune the heat transfer,” he said. “We recognize that networking devices are power hungry now and will be more so in the future, so the water that splits to the memory is coupled to a drive, an NVMe or SSD, and coupled to a network device, like ConnectX-5 or OPA, and then the water flows and connects back to conduction point.”

Two Lenovo ThinkSystem SD650 servers on the compute tray that provides water cooling. Source: Lenovo

Lenovo designed the system with special attention to the next-generation memory technologies. Each server has 12 DIMM slots for truDDR4 memory but there are actually 16 slots total. Four have been reserved for 3D-XPoint (also known as Apache Pass or AEP memory). The cooling system is able to extract 10 watts on standard DIMMs, and for 3D XPoint and other higher-powered memory future designs, they’ll have two water lines going through a DIMM that can consume 18 watts. Lenovo also provides a handy DIMM removal tool making it easy to swap out memory.

Lenovo has been picking up major HPC system awards in Europe since acquiring IBM’s x86 business three and a half years ago. It has the fastest supercomputer in Spain, Italy, Denmark, Norway, Australia, Canada, and soon in Germany with LRZ. It has also been making in-roads with its warm water cooling solutions. In addition to its systems at LRZ, it has warm water HPC installations at Peking University (first ever in China), India Space Administration (first ever in India), and a multi-university system in Norway.

Liquid cooling is becoming mainstream in HPC, especially in environments where constraints on space boost density requirements or in expensive energy zones. Lenovo tells customers that when it comes to electricity prices, anything over 15 cents per kilowatt hour will provide a return on investment within one year. Another benefit of removing more heat is that CPUs can run in “turbo” mode nonstop, which can squeeze an additional 10 percent performance from them.

The SD650 is managed by Lenovo Intelligent Computing Orchestrator (LiCO), a management suite with an intuitive GUI that supports management of large HPC cluster resources and accelerates development of AI applications. LiCO works with the most common AI frameworks, including TensorFlow, Caffe and Microsoft CNTK.