22 June 2018

HPC and HPDA for the Cognitive Journey with OpenPOWER

The high-performance computing landscape is evolving at a furious pace that some are describing as an important inflection point, as Moore’s Law delivers diminishing returns while performance demands increase. Leaders of organizations are grappling with how to embrace recent system-level innovations like acceleration, while simultaneously being challenged to incorporate analytics into their HPC workloads.

Intro summit webinar: Innovative and Novel Computational Impact on Theory and Experiment (INCITE) Program for 2019

On the horizon, even more demanding applications built with machine learning and deep learning are emerging to push system demands to all-new highs. With all of this change in the pipeline, the usual tick-tock of minor code tweaks to accompany nominal hardware performance improvements can’t continue as usual. For many HPC organizations, significant decisions need to be made.

Realizing that these demands could only be addressed by an open ecosystem, IBM partnered with other industry leaders Google, Mellanox, NVIDIA and others to form the OpenPOWER Foundation, dedicated to stewarding the Power CPU architecture into the next generation.

A data-centric approach to HPC with OpenPOWER

In 2014, this disruptive approach to HPC innovation led to IBM being awarded two contracts to build the next generation of supercomputers as part of the US Department of Energy’s Collaboration of Oak Ridge, Argonne, and Lawrence Livermore, or CORAL program. In partnership with NVIDIA and Mellanox, we demonstrated to CORAL that a “data-centric” approach to systems – an architecture designed to embed compute power everywhere data resides in the system, positioning users for a convergence of analytics, modeling, visualization and simulation, which could lead to driving new insights at incredible speeds – could help them achieve their goals. Now, on the three-year anniversary of that agreement, we’re pleased to announce that we are delivering on our project, with our next-generation IBM Power Systems with NVIDIA Volta GPUs being deployed at Oak Ridge and Lawrence Livermore National Labs.

Moving mountains

Both systems, Summit at ORNL and Sierra at LLNL, are being installed as you read this, with completion expected early next year. Both systems are impressive. Summit is expected to increase individual application performance 5 to 10 times over Titan, Oak Ridge’s older supercomputer, and Sierra is expected to provide 4 to 6 times the sustained performance of Sequoia, Lawrence Livermore’s older supercomputer.

Summit Supercomputer

With Summit in place, Oak Ridge National Labs will advance their stated mission: “Be able to address, with greater complexity and higher fidelity, questions concerning who we are, our place on earth, and in our universe.” But most importantly, the clusters will position them to push the boundaries of one of the most important technological developments of our generation, artificial intelligence (AI).

IBM's world-class Summit supercomputer gooses speed with AI abilities

Built for AI, built for the future

However, emerging AI workloads are vastly different than traditional HPC workloads. The measurements of performance listed above, while interesting, do not really capture the performance requirements for deep learning algorithms. With AI workloads, bottlenecks shift away from compute and networking back to data movement at the CPU level. IBM POWER9 systems are specifically designed for these emerging challenges.

IBM Readies POWER9-based Systems for US Department of Energy CORAL Supercomputers at SC17

“We’re excited to see accelerating progress as the Oak Ridge National Laboratory Summit supercomputer continues to take shape. The infrastructure is now complete and we’re beginning to deploy the IBM POWER9 compute nodes. We’re still targeting early 2018 for the final build-out of the Summit machine, which we expect will be among the world’s fastest supercomputers. The advanced capabilities of the IBM POWER9 CPUs coupled with the NVIDIA Volta GPUs will significantly advance the computational performance of DOE’s mission critical applications,” says Buddy Bland, Oak Ridge Leadership Computing Facility Director.

AI, The Next HPC Workload

POWER9 leverages PCIe Gen-4, next-generation NVIDIA NVLink interconnect technology, memory coherency and more features designed to maximize throughput for AI workloads. This should translate to more overall performance and larger scales while reducing space creep due to excessive node counts and potentially out-of-control power consumption. Projections from competitors show anticipated node counts exceeding 50,000 to break into exascale territory; but this is not until 2021. Already this year, IBM was able to leverage distributed deep learning to reduce model training time from 16 days to 7 hours by successfully scaling TensorFlow and Caffe across 256 NVIDIA Tesla GPUs. These new systems feature 100 times more GPUs spread across thousands of nodes, meaning the only theoretical limit to the deep learning benchmarks we can set with these new supercomputers is our own imaginations.

Start with the data

Data preparation for deep learningAll machine learning and deep learning models train on large amounts of data. Fortunately (and unfortunately), organizations are swimming in data sitting in structured and unstructured forms, and beyond the data they have under their control, organizations also have access to data for free or for a fee from a variety of sources.
Often, little of this data is in proper placement or forms for training a new AI model. To date, we have found that this has been a problem largely solved by manual methods: miles and miles of python scripting, often run inside spark clusters for speed of execution, along with a lot of orphan code.

To help shorten transformation time, PowerAI Enterprise integrates a structured, template-based approach to building and transforming data sets. It starts with common output formats (LMDB, TensorFlowRecords, Images for Vector Output), and allows users to define the input format/structure of raw data and some of the key characteristics of what is needed in the transform step.

The data import tools in PowerAI Enterprise are aware of the size and the complexity of the data and the resources available to transform the data. For this reason, the integrated resource manager is able to intelligently manage the execution of the job: helping to optimize for either low cost (run across the fewest number of nodes/cores) or optimize for the fastest execution of the transform job (run across more nodes/cores).

Integrated into the data preparation step is a quality check function which is designed to allow a data engineer and a data scientist to check the clarity of the signal in the data, running a simplified model and sample training from within the data import tool. Although not as sophisticated as a fully-developed model, this “gut check” allows a data scientist to discover early on in the process whether there are obvious issues or deficiencies in the training data set before investing significant time in the model development phase.

Cognitive Computing: From Data to Analytics to Learning

The majority of data doesn't offer much value unless iteratively and progressively analyzed by the user and the system to produce powerful insights with recommended actions for the best outcome(s). In fact, IBM Watson (IBM’s leadership Cognitive system) constantly sifts through data, discovers insights, learns and determines the best course of action(s).

Learning (Cognitive and Deep Machine Learning) interactive analytics systems that continuously build knowledge over time by processing natural language and data. These systems learn a domain by experience just as humans do and can discover and suggest the “best course of action”; providing highly time-critical valuable guidance to humans or just executing this “next best action”. IBM Watson is the premier cognitive system in the market.

Converging big data, AI, and BI with a GPU-accelerated database by Karthik Lalithraj

The underlying technologies for Deep Learning include Artificial Neural Networks (ANN)–neural networks inspired by and designed to mimic the function of the cortex, the thinking matter of the brain. Driverless autonomous cars, robotics and personalized medical therapies are some key disruptive innovations enabled by Deep Learning.
A performance-optimized infrastructure is critical for the Cognitive Computing journey.

Speed up the model development process

PowerAI Enterprise includes powerful model setup tools designed to address the earliest “dead end” training runs. Integrated hyperparameter optimization automates the process of characterizing new models by sampling from the training data set and instantiating multiple small training jobs across cluster resources (this means sometimes tens, sometimes thousands of jobs depending on the complexity of the model). The tool is designed to select the most promising combinations of hyperparameters to return to the data science teams. The outcome: fewer non-productive early runs and more time to focus on refining models for greater organizational value.

Once you have selected hyperparameters, you can begin bringing together all of the different elements a deep learning model training.

Lecture 15 | Efficient Methods and Hardware for Deep Learning

This next phase in the development process is extremely iterative. Even with assistance in selecting the correct hyperparameters, ensuring that your data is clean and has a clear signal within it, and that you were able to operate at the appropriate level of scale, chances are you will still be repeating training runs. By instrumenting the training process, PowerAI Enterprise can allow a data scientist to see feedback in real time on the training cycle.

PowerAI Enterprise provides the ability to visualize current progress and status of your training job, including iteration, loss, accuracy and histograms of weights, activations, and gradients of the neural network.
With this feedback, data scientists and model developers are alerted when the training process begins to go awry. These early warnings can allow data scientists and model developers to stop training runs that will eventually go nowhere and adjust parameters.

These workflow tools run on top of IBM’s scalable, distributed deep learning platforms. They take the best of open source frameworks and augment them for both large model support and better cluster performance, both of which open up the potential to take applied artificial intelligence into areas and use cases which were not previously feasible.

Bringing all these capabilities together accelerates development for data scientists, and the combination of automating workflow and extending the capabilities of open source frameworks unlocks the hidden value in organizational data.

As Gurinder Grewal, Senior Director, Architecture at PayPal said at IBM’s Think conference: “How do you take all these technologies and marry them together to build end to end platform that we can hand over to a data scientist and the business policy owners so they can extract most value out of the data? I think that’s what excites us most about things your company is working on in terms of the PowerAI platform… I think that’s one of the things we actually really appreciate the openness of the platform, extracting the most value out of the compute power we have and the power from the data.”

A foundation for data science as a service

At the core of the platform is an enterprise-class management software system for running compute- and data-intensive distributed applications on a scalable, shared infrastructure.

IBM PowerAI Enterprise supports multiple users and lines of business with multi-tenancy end-to-end security, including role-based access controls. Organizational leaders are looking to deploy AI infrastructure at scale. The combination of integrated security (including role-based access, encryption of workload and data), the ability to support service level agreements, and an extremely scalable resource orchestration designed for very large compute infrastructure, mean that it is now possible to share data science environments across the organization.

High Performance Computing and the Opportunity with Cognitive Technology

One customer which has successfully navigated the new world of AI is Wells Fargo. They use deep learning models to comply with a critical financial validation process. Their data scientists build, enhance, and validate hundreds of models each day. Speed is critical, as well as scalability, as they deal with greater amounts of data and more complicated models. As Richard Liu, Quantitative Analytics manager at Wells Fargo said at IBM Think, “Academically, people talk about fancy algorithms. But in real life, how efficiently the models run in distributed environments is critical.” Wells Fargo uses the IBM AI Enterprise software platform for the speed and resource scheduling and management functionality it provides. “IBM is a very good partner and we are very pleased with their solution,” adds Liu.

Each part of the platform is designed to remove both time and pain from the process of developing a new applied artificial intelligence service. By automating highly repetitive and manual steps, time is saved for improving and refining models, which can lead to a higher-quality result.

We’ve introduced a lot of new functionality in IBM PowerAI Enterprise 1.1, and I’ll be sharing more detail on these new capabilities in future posts. I also welcome your input as we continue to add new capabilities moving forward.