Intel® HPC Developer Conference 2017

Artificial intelligence (AI) is unlocking tremendous economic value across various sectors. Data scientists can use several open source frameworks and basic hardware resources during very initial investigative phases. This talk details what's been done and future plans to democratize AI.

Inside the Altair HyperWorks* computer-aided engineering (CAE) suite, RADIOSS is the tool primarily used for crash and safety simulations. This presentation explains how the code has been optimized for new Intel® Xeon® Scalable processors and Intel® Xeon Phi™ processors, leveraging Intel® AVX512 instruction set to improve solver speed.

Walk through some of the real-life software engineering challenges encountered throughout the development stages of refactoring an aging HPC codebase. Learn about some of the obstacles and the strategies used to resolve them.

In this talk, we review different approaches to design linear solvers—the workhorse in many simulators— that are robust, performant, and scalable on the new manycore architectures, such as Intel® Xeon Phi™ processors, while taking into account the specifics of the hardware.

Federated learning approaches have recently made AI on edge devices desirable and necessary for the evolution of ubiquitous artificial intelligence. This session presents emerging AI technologies, their feasibility on edge, and their role in federated learning.

TensorFlow* is a leading machine learning and deep learning framework that enables data scientists to address problems on a variety of devices ranging from multicore CPUs to custom ASICs (TPUs). We share how optimizing TensorFlow has resulted in an up to 85 times the speedup on common neural network models.

Deep learning and AI is revolutionizing how we work and interact with technology. As model sizes grow, the ability to adapt for fast power efficiency will limit performance scalability. We explore technical and practical feasibility of low-precision deep neural networks.

Cray Urika-XC* brings a suite of analytics and deep learning software to Urika-XC users. This talk presents an overview of Urika-XC and its productivity and scaling benefits. We demonstrate these benefits with a full deep learning workflow that uses Apache Spark* and BigDL to predict rainfall.

We present the first 15 petaFLOPS deep learning system for solving supervised and semi-supervised scientific pattern classification problems, optimized for Intel® Xeon Phi™ processors. We use a hybrid of synchronous and asynchronous training to scale to approximately 9600 nodes of Cori on convolutional neural networks (CNN) and autoencoder networks.

We present a novel deep learning pipeline for using genetic variant data to predict patient risk for several clinical phenotypes. We compare our approach to standard methods in the literature, and discuss performance and optimization of our approach with TensorFlow on Intel® architecture.

Learn how to accelerate deep learning inference and training on manycore and multicore CPUs using stand-alone frameworks like Caffe or TensorFlow. Eliminate the need for the GPU by achieving superior inference performance with an 8-core Intel® Xeon® processor with real-world automotive and neuroscience examples.

Modern parallel computing techniques and optimizations for Intel® Xeon Phi™ processors have allowed dramatic acceleration of computations to characterize neural circuits of the brain. We show how HPC and AI can impact clinical care.

As a scientist at Los Alamos National Laboratory, the speaker created a near-linear scalable mapping during the 1980s that has run on most leadership class supercomputers using tens of thousands of nodes and delivers PF/s training performance.

See how Descartes Labs created a cloud-based supercomputing platform for the application of machine intelligence to massive data sets. Capitalizing on the confluence of advances in AI and HPC in the cloud, they created an enterprise data refinery for satellite imagery on a global scale.

Although TensorFlow supports multicore CPUs, evaluation of the default CPU backend reveals suboptimal performance. In this talk, we describe a collaborative effort between Intel and Google engineers to optimize TensorFlow for modern x86 systems, resulting in speedups of up to 85 times on common neural network models over the default CPU backend. We demonstrate the capability of Intel® Optimization for TensorFlow* on the latest Intel® Xeon® and Intel® Xeon Phi™ processors.

Training models with high accuracy on large image datasets can typically take weeks or months. We present scale-out results that suggest one can achieve this much faster, such as training the Resnet-50 architecture on Imagenet-1K using either Intel® Xeon® or Intel® Xeon Phi™ processor nodes.

Deep learning frameworks provide good performance on a single workstation, but scaling across multiple nodes is less understood and evolving. This introductory lecture helps to explain the key steps to enable deep learning capabilities in your existing HPC system without additional hardware.

Machine learning has inspired novel data analysis techniques in experiments such as the Large Hadron Collider of the European Organization for Nuclear Research (CERN). In this work, we handle the problem of boosted jet classification in high-energy physics by using artificial neural networks.

In this presentation, we describe the design of a fully scalable fixed-point DSP MACs with rounding and saturation. We used the MACs for the neurons within the CNN (convolutional neural network) in sequential and parallel structure, and evaluate the corresponding performance.

This session discusses a flow graph and extension to the Intel® Threading Building Blocks interface used for coordinating layers for heterogeneity to retain optimization opportunities and composing with existing models. We also discuss expressing complex synchronization, communication patterns and in balancing the load between CPUs, GPUs, and FPGAs.