To enable characterization of parallel application communication patterns by non-experts, we developed an approach for automatically recognizing and parameterizing communication patterns in MPI-based applications. Beginning with a communication matrix that indicates how much data each process transferred to every other process during the application's run, we use an automated search to recognize communication patterns within this matrix. At each search step, we recognize patterns from a pattern library in the communication matrix. Using a technique similar to astronomy's "sky subtraction," when we recognize a pattern we remove it from the matrix and apply our recognition approach recursively to the resulting matrix. Because more than one pattern might be recognized at each search step, the search produces a search results tree whose paths between root and leaves represent collections of patterns recognized in the original matrix. The path that accounts for the most of the original communication matrix's traffic corresponds to the collection of patterns that best explains the application's communication behavior. We implemented our approach in a tool called AChax that was highly effective in recognizing the communication patterns in a synthetic communication matrix and the regular communication patterns in matrices obtained from the LAMMPS molecular dynamics and LULESH shock hydrodynamics applications.

Funding for this work was provided by the Office of Advanced Scientific Computing Research, U.S. Department of Energy. The work
was performed at ORNL.

Researchers were able to achieve a 13X performance boost by migrating to GPUs. This increase in performance makes it possible to solve kinetic systems in parallel on GPUs.

We demonstrate the first implementation of recently-developed fast explicit kinetic integration algorithms on modern graphics processing unit (GPU) accelerators. Taking as a generic test case a Type Ia supernova explosion with an extremely stiff thermonuclear network having 150 isotopic species and 1604 reactions coupled to hydrodynamics using operator splitting, we demonstrate the capability to solve of order 100 realistic kinetic networks in parallel in the same time that standard implicit methods can solve a single such network on a CPU. This orders-of-magnitude decrease in computation time for solving systems of realistic kinetic networks implies that important coupled, multiphysics problems in various scientific and technical fields that were intractable, or could be simulated only with highly schematic kinetic networks, are now computationally feasible.

Researchers developed a methodology and tool to classify memory access patterns from traces.

Classifying memory access patterns is paramount to the selection of the right set of optimizations and determination of the parallelization strategy. Static analyses suffer from ambiguities present in source code, which modern compilation techniques, such as profile-guided optimization, alleviate by observing runtime behavior and feeding back into the compilation flow. We implemented a dynamic analysis technique for recognizing memory access patterns, with application to the stencils domain.

The Bellerophon Environment for Analysis of Materials (BEAM) has added the capability for instrument scientists at IFIM and CNMS to perform and monitor near real-time data analysis by dynamically generating and executing HPC workflows on Titan at OLCF, Edison and Hopper at NERSC, and the Pileus compute cluster at CADES.

This work will accelerate scientific discovery by enabling IFIM and CNMS users to perform robust data analysis in only a fraction of the time required using today's techniques. BEAM users will be able to directly utilize DOE HPC compute and data resources to efficiently integrate and analyze complex multi-modal data with no previous experience as a computer programmer or HPC user.

Work was performed at ORNL, IFIM, CNMS, OLCF, and NERSC and is supported by the LDRD program of Oak Ridge National Laboratory.

In a suite of experiments forced with a range of Black Carbon aerosols (BC) within estimated uncertainty bounds, we have analyzed the impact of BC on the expansion of the Tropics using a variety of metrics that quantify the extent of Tropics. These experiments suggest that the tropical expansion increases nearly linearly with increasing BC forcing due to the relative warming of the mid-latitudes by increased absorption of solar radiation by BC.

Global Climate Models (GCMs) underestimate the observed trend in tropical expansion. Recent studies partly attribute it to black carbon aerosols (BC), which are poorly represented in GCMs. We conduct a suite of idealized experiments with the Community Atmosphere Model (CAM4) coupled to a slab ocean model forced with increasing BC concentrations covering a large swath of the estimated range of current BC radiative forcing while maintaining their spatial distribution. The Northern Hemisphere (NH) tropics expand polewards nearly linearly as BC radiative forcing increases (0.70 W-1 m2), indicating that a realistic representation of BC could reduce GCM biases. We find support for the mechanism where BC induced midlatitude tropospheric heating shifts the maximum meridional tropospheric temperature gradient polewards resulting in tropical expansion. We also find that the NH poleward tropical edge is nearly linearly correlated with the location of the inter- tropical convergence zone (ITCZ), which shifts northwards in response to increasing BC.

Researchers developed a discretization for the phase-field crystal equation, which guarantees mass and energy conservation while maintaining second order accuracy in space and time. They were able to prove that conservation is achieved discretely and demonstrate this on simulations in two- and three-dimensions.

The phase-field crystal equation, a parabolic, sixth-order and nonlinear partial differential equation, has generated considerable interest as a possible solution to problems arising in molecular dynamics. Nonetheless, solving this equation is not a trivial task, as energy dissipation and mass conservation need to be verified for the numerical solution to be valid. This work addresses these issues, and proposes a novel algorithm that guarantees mass conservation, unconditional energy stability and second-order accuracy in time. Numerical results validating our proofs are presented, and two- and three-dimensional simulations involving crystal growth are shown, highlighting the robustness of the method.

Researchers designed a multilevel Monte Carlo algorithm for stochastic differential equation where the number of levels, the grid resolutions, and number of samples per level are determined so as to minimize computational costs while satisfying constraints on the bias and statistical error. The parameters of the numerical models used to estimate costs and errors are dynamically calibrated based on information obtained as the process continues. We demonstrate that our algorithm runs anywhere from 2-12 times faster than standard techniques.

The researcher was able to reduce (by a factor of 1,000) the number of meta-data records needed to track stored Climate Science simulation data. As Climate data represents a significant portion of Titan computing resources, this reduction in records can have a significant impact on HPSS operations.

The ACME project is a major consumer of cycles from world class capability computing systems such as Titan. In the process of doing our science on these large-scale machines, simulation data are generated and need to be archived for future analysis. The system to satisfy this archival need is the High Performance Storage System (HPSS). The details of HPSS operation are many but one of the limiting factors in how much data the system can handle is related to the number of files the system needs to keep track of, so called meta-data operations. Storing data to HPSS can be a time consuming process for a user, and complicated given the interaction between different project requirements and HPSS system requirements. To alleviate these issues we have have codified the archiving of simulation files to a tar file along with the logic of reliable storage to the HPSS system. These operations are preformed through submitting jobs to the queuing system of the Data Transfer Nodes allowing very large file sets to be processed, automatic recording of outcomes and potential faults. The use of the queuing system also allows for large numbers of the jobs to be submitted without worry of overloading the system compared to manually running the processes due to limitations in how many queued process will be run concurrently. This program will be use as part of a larger whole to provide efficient, reliable, low effort and low cost storage of critical simulation files for the ACME project, while be able to be adapted to other projects as well. This program will also be used as a base for the data handling portion of a large Automated Workflow effort inside of ACME.

Researchers at ORNL developed a regionalization framework to quantify extreme events. Using the framework they demonstrated that simulations with the high-resolution global climate model better capture stationary and non-stationary precipitation extremes than low-resolution version of the model as represented by Generalized Extreme Value (GEV) distributions. This regionalization framework is implemented in parallel allowing for a quick analysis of global climate extremes in ultra-high resolution global climate models, speeding up data analysis by several orders of magnitude. We find that parameters of the GEV distribution models of precipitation extremes are better represented in the high-resolution simulations.

Precipitation extremes have tangible societal impacts. Here, we assess if current state of the art global climate model simulations at high spatial resolutions (0.35◦x0.35◦) capture the ob- served behavior of precipitation extremes in the past few decades over the continental US. We design a correlation-based regionalization framework to quantify precipitation extremes, where samples of extreme events for a grid box may also be drawn from neighboring grid boxes with statistically equal means and statistically significant temporal correlations. We model precipitation extremes with the Generalized Extreme Value (GEV) distribution fits to time series of annual maximum precipitation. Non-stationarity of extremes is captured by including a time- dependent parameter in the GEV distribution. Our analysis reveals that the high-resolution model substantially improves the simulation of stationary precipitation extreme statistics particularly over the Northwest Pacific coastal region and the Southeast US. Observational data exhibits significant non-stationary behavior of extremes only over some parts of the Western US, with declining trends in the extremes. While the high resolution simulations improve upon the low resolution model in simulating this non-stationary behavior, the trends are statistically significant only over some of those regions.

Researchers provided a numerical weather prediction perspective on the development of phenomenological models focused on the prediction of tumor growth. They also provided examples of potential benefit including using available patient data as an initial state to predict outcomes, test sensitivity of therapies to initial therapies.

Researchers propose that the quantitative cancer biology community makes a concerted effort to apply lessons from weather forecasting to develop an analogous methodology for predicting and evaluating tumor growth and treatment response. Currently, the time course of tumor response is not predicted; instead, response is only assessed post hoc by physical examination or imaging methods. This fundamental practice within clinical oncology limits optimization of a treatment regimen for an individual patient, as well as to determine in real time whether the choice was in fact appropriate. This is especially frustrating at a time when a panoply of molecularly targeted therapies is available, and precision genetic or proteomic analyses of tumors are an established reality. By learning from the methods of weather and climate modeling, we submit that the forecasting power of biophysical and biomathematical modeling can be harnessed to hasten the arrival of a field of predictive oncology. With a successful methodology toward tumor forecasting, it should be possible to integrate large tumor-specific datasets of varied types and effectively defeat one cancer patient at a time.

Batteries are highly complex electrochemical systems, with performance and safety governed by coupled nonlinear electrochemical-electrical-thermal-mechanical processes over a range of spatiotemporal scales. We describe a new, open source computational environment for battery simulation known as VIBE - the Virtual Integrated Battery Environment. VIBE includes homogenized and pseudo-2D electrochemistry models such as those by Newman-Tiedemann-Gu (NTG) and Doyle-Fuller-Newman (DFN, a.k.a. DualFoil) as well as a new advanced capability known as AMPERES (Advanced MultiPhysics for Electrochemical and Renewable Energy Storage). AMPERES provides a 3D model for electrochemistry and full coupling with 3D electrical and thermal models on the same grid. VIBE/AMPERES has been used to create three-dimensional battery cell and pack models that explicitly simulate all the battery components (current collectors, electrodes, and separator). The models are used to predict battery performance under normal operations and to study thermal and mechanical response under adverse conditions.
This work was performed at ORNL and sponsored by DOE's EERE office.

Scalable and Fault Tolerant Failure Detection and Consensus

A. Katti, G. Di Fatta, T. Naughton, and C. Engelmann

Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum's User Level Failure Mitigation proposal has introduced an operation (MPI_Comm_shrink) to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. The MPI_Comm_shrink operation requires a fault tolerant failure detection and consensus algorithm. This work developed two novel failure detection and consensus algorithms to support this operation. The algorithms are based on Gossip protocols and are inherently fault-tolerant and scalable. The first algorithm is based on global knowledge: each process maintains a local view of the entire system state to achieve consensus on failed processes. A Gossip protocol is used to detect failures and to exponentially propagate them in the system until the local views converge. The second algorithm does not rely on global knowledge and adopts a heuristic method to achieve consensus on failures. The algorithms were implemented and tested using the Extreme-scale Simulator (xSim), ORNL's performance/resilience investigation toolkit for simulating future-generation extreme-scale high-performance computing systems. The results show that in both algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory usage and network bandwidth costs and a perfect synchronization in achieving global consensus.

This work was performed at the University of Reading, UK and ORNL. The work at ORNL was funded by DOE's Advanced Scientific Computing Research office's Exascale Operating System and Runtime (ExaOS/R) program

On July 15, 2015, developers from Oak Ridge National Laboratory (ORNL) and Los Alamos National Laboratory (LANL) released the Land Ice Verification and Validation Software kit (LIVVkit), which was developed within the DOE ASCR/BER SciDAC project titled, Prediction of Ice Sheet and Climate Evolution at Extreme Scales (PISCEES). With their collaborators at LANL, the ORNL team created the first capability to systematically evaluate the continental-scale Community Ice Sheet Model (CISM) using an advanced python environment with a fully interactive website.

Since 2012, a team of both BER and ASCR researchers have been developing a tool that allows ice sheet researchers to quickly track model changes (both desirable and undesirable) to aid in the rapid development of ice sheet models to simulate the full Greenland Ice Sheet. The toolkit has several parts that work together to create a robust V&V capability. It provides comprehensive comparisons for a suite of benchmark tests of the Community Ice Sheet Model tested for results on desktop systems as well as Leadership Class Computing Facilities (LCF), including Titan at OLCF and Edison at NERSC. It generates a wealth of data and plots for a suite of test cases on a hierarchical webpage, so that if there are discrepancies, the user can delve deeply into the results and model inputs to quickly diagnose the source of change.

In addition to model verification of the ice sheet simulation, LIVVkit provides a novel, performance-based V&V capability. When using an LCF system, LIVVkit can automatically run large-scale test cases and compare their computational performance to a suite of baselines to detect performance changes outside an expected range. It enables users to detect bugs that affect performance and optimize performance parameters by providing hierarchical and interactive information. Performance validation is also provided via weak and strong scaling information, and this provides a cost-benefit analysis that can inform developers about the quantitative gain in code improvements. With complete performance information, code developers can decide quantitatively what code improvements are worth the extra expense and under what circumstances.

Currently, LIVVkit supports the DOE BER Accelerated Climate Model for Energy's (ACME) land ice and coupled model development effort. Release 1.0 can be found on github: https://github.com/LIVVkit/LIVVkit.

Quantifying Scheduling Challenges for Exascale System Software

Oscar Mondragon, University of New Mexico, Patrick Bridges, University of New Mexico, Terry Jones, ORNL

The move towards high-performance computing (HPC) applications comprised of coupled codes and the need to dramatically reduce data movement is leading to a reexamination of time-sharing vs. space-sharing in HPC systems. In this paper, we discuss and begin to quantify the performance impact of a move away from strict space-sharing of nodes for HPC applications. Specifically, we examine the potential performance cost of time-sharing nodes between application components, we determine whether a simple coordinated scheduling mechanism can address these problems, and we research how suitable simple constraint-based optimization techniques are for solving scheduling challenges in this regime. Our results demonstrate that current general-purpose HPC system software scheduling and resource allocation systems are subject to significant performance deficiencies which we quantify for six representative applications. Based on these results, we discuss areas in which additional research is needed to meet the scheduling challenges of next-generation HPC systems.

In this paper, we introduce a scalable preconditioner within the Community Atmospheric Model (CAM) model that is designed to improve the efficiency of the linear system solves in the implicit dynamics solver. Performing accurate and efficient numerical simulation of global atmospheric climate models is challenging due to the disparate length and time scales over which physical processes interact. Implicit solvers enable the physical system to be integrated with a time step commensurate with processes being studied rather than to maintain stability. The dominant cost of an implicit time step is the ancillary linear system solves, so the preconditioner, which is based on an approximate block factorization of the linearized shallow-water equations, has been implemented within the spectral element dynamical core of CAM to minimize this expense. In this paper, we discuss the development and scalability of the preconditioner for a suite of test cases with the implicit shallow-water solver within CAM, and show how the choice of solver parameter settings affects the behavior of both the solver and preconditioner. We also present the remaining steps to gain efficiency using this solver strategy.

This work is funded by DOE BER/ASCR and is part of the Multiscale BER SciDAC project

Identifying correlations and relationships between entities within and across different data sets (or databases) is of great importance in many domains. The data warehouse-based integration, which has been most widely practiced, is found to be inadequate to achieve such a goal. Instead we explored an alternate solution that turns multiple disparate data sources into a single heterogeneous graph model so that matching between entities across different source data would be expedited by examining their linkages in the graph. We found, however, while a graph-based model provides outstanding capabilities for this purposes, construction of one such model from relational source databases were time consuming and primarily left to ad hoc proprietary scripts. This led us to develop a reconfigurable and reusable graph construction tool that is designed to work at scale.

Arguably one of the most important effects of climate change is the potential impact on human health. While this is likely to take many forms, the implications for future transmission of vector-borne diseases (VBDs), given their ongoing contribution to global disease burden, are both extremely important and highly uncertain. In part, this is owing not only to data limitations and methodological challenges when integrating climate-driven VBD models and climate change projections, but also, perhaps most crucially, to the multitude of epidemiological, ecological and socio-economic factors that drive VBD transmission. This complexity has generated considerable debate over the past 10–15 years. In this review article, the authors seek to elucidate current knowledge around this topic, identify key themes and uncertainties, evaluate ongoing challenges and open research questions and, crucially, offer some solutions for the field. Although many of these challenges are ubiquitous across multiple VBDs, more specific issues also arise in different vector–pathogen systems.

Analyzing the Interplay of Failures and Workload on a Leadership-Class Supercomputer

Esteban Meneses, University of Pittsburgh, Xiang Ni, University of Illinois at Urbana-Champaign, Terry Jones, ORNL, and Don Maxwell, ORNL

The unprecedented computational power of current supercomputers now makes possible the exploration of complex problems in many scientific fields, from genomic analysis to computational fluid dynamics. Modern machines are powerful because they are massive: they assemble millions of cores and a huge quantity of disks, cards, routers, and other components. But it is precisely the size of these machines that glooms the future of supercomputing. A system that comprises many components has a high chance to fail, and fail often. In order to make the next generation of supercomputers usable, it is imperative to use some type of fault tolerance platform to run applications on large machines. Most fault tolerance strategies can be optimized for the peculiarities of each system and boost efficacy by keeping the system productive. In this paper, we aim to understand how failure characterization can improve resilience in several layers of the software stack: applications, runtime systems, and job schedulers. We examine the Titan supercomputer, one of the fastest systems in the world. We analyze a full year of Titan in production and distill the failure patterns of the machine. By looking into Titan's log files and using the criteria of experts, we provide a detailed description of the types of failures. In addition, we inspect the job submission files and describe how the system is used. Using those two sources, we cross correlate failures in the machine to executing jobs and provide a picture of how failures affect the user experience. We believe such characterization is fundamental in developing appropriate fault tolerance solutions for Cray systems similar to Titan.

We propose that the quantitative cancer biology community makes a concerted effort to apply lessons from weather forecasting to develop an analogous methodology for predicting and evaluating tumor growth and treatment response. Currently, the time course of tumor response is not predicted; instead, response is only assessed post hoc by physical examination or imaging methods. This fundamental practice within clinical oncology limits optimization of a treatment regimen for an individual patient, as well as to determine in real time whether the choice was in fact appropriate. This is especially frustrating at a time when a panoply of molecularly targeted therapies is available, and precision genetic or proteomic analyses of tumors are an established reality. By learning from the methods of weather and climate modeling, we submit that the forecasting power of biophysical and biomathematical modeling can be harnessed to hasten the arrival of a field of predictive oncology. With a successful methodology toward tumor forecasting, it should be possible to integrate large tumor-specific datasets of varied types and effectively defeat one cancer patient at a time.

In this work it was discovered that layers of two-dimensional (2D) materials with different atomic registries have characteristic Raman spectra fingerprints in the low frequency spectral range that can be used to characterize stacking patterns of these materials. Stacked monolayers of 2D materials present a new class of hybrid materials with tunable optoelectronic properties determined by their stacking orientation, order, and atomic registry. Fast optical determination of the exact atomic registration between different layers, in few-layer 2D stacks is a key factor for rapid development of these materials and their applications. Using two- and three-layer MoSe2 and WSe2 crystals synthesized by chemical vapor deposition we show that the generally unexplored low frequency Raman modes (< 50 cm-1) that originate from interlayer vibrations can serve as fingerprints to characterize not only the number of layers, but also their stacking configurations. Ab initio calculations and group theory analysis corroborate the experimental assignments and show that the calculated low frequency mode fingerprints are related to the 2D crystal symmetries.

The collective communication operations, which are widely used in parallel applications for global communication and synchronization, are critical for application's performance and scalability. However, how faulty collective communications impact the application and how errors propagate between the application processes is largely unexplored. One of the critical reasons for this situation is the lack of fast evaluation method to investigate the impacts of faulty collective operations. The traditional random fault injection methods relying on a large amount of fault injection tests to ensure statistical significance require a significant amount of resources and time. These methods result in prohibitive evaluation cost when applied to the collectives.

In this work, we explore a novel tool named Fast Fault Injection and Sensitivity Analysis Tool (FastFIT) to conduct fast fault injection and characterize the application sensitivity to faulty collectives. The tool achieves fast exploration by reducing the exploration space and predicting the application sensitivity using Machine Learning (ML) techniques. A basis for these techniques is implicit correlations between MPI semantics, application context, critical application features, and application responses to faulty collective communications. The experimental results show that our approach reduces the fault injection points and tests by 97% for representative benchmarks (NAS Parallel Benchmarks (NPB)) and a realistic application (Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS)) on a production supercomputer. Further, we explore statistically generalizing the application sensitivity to faulty collective communications for these workloads, and present correlation between application features and the sensitivity.

The Lustre 101 web-based course series is focused on administration and monitoring of large-scale deployments of the Lustre parallel file system. Course content is drawn from nearly a decade of experience in deploying and operating leadership-class Lustre file systems at the Oak Ridge Leadership Computing Facility (OLCF) at Oak Ridge National Laboratory (ORNL).
A primary concern in deploying a large system such as Lustre is building the operational experience and insight to triage and resolve intermittent service problems. Although there is no replacement for experience, it is also true that there is no adequate training material for becoming a Lustre administration expert. The overall goal of the Lustre 101 course series is to distill and disseminate to the Lustre community the working knowledge of ORNL administration and technical staff in the hope that others can avoid the trials and tribulations of large-scale Lustre administration and monitoring.

The Lustre Administration Essentials course is targeted at experienced system administrators who are relatively new to Lustre, but may have prior experience with other distributed and parallel file systems. Topics in this course include an introduction to Lustre, hardware selection and benchmarking strategies, Lustre software installation and basic configuration, Lustre tuning and LNet configuration, and basic file system administration and monitoring approaches.

This paper presents a preliminary design study and initial evaluation of an operating system/runtime (OS/R) environment, Hobbes, with explicit support for composing HPC applications from multiple cooperating components. The design is based on our previously presented vision and makes systematic use of both virtualization and lightweight operating systems techniques to support multiple communicating application enclaves per node. In addition, it also includes efficient inter-enclave communication tools to enable application composition. Furthermore, we show that our Hobbes OS/R supports the composition of applications across multiple isolated enclaves with little to no performance overhead.

Work was performed at University of Pittsburgh, ORNL, Georgia Institute of Technology, Los Alamos National Laboratory, Sandia National Laboratories, and the University of New Mexico. Sponsored by the DOE, Office of Science, Advanced Scientific Computing Research (ASCR) program.

Precipitation extremes have tangible societal impacts. Here, we assess if current state of the art global climate model simulations at high spatial resolutions (0.35◦x0.35◦) capture the ob- served behavior of precipitation extremes in the past few decades over the continental US. We design a correlation-based regionalization framework to quantify precipitation extremes, where samples of extreme events for a grid box may also be drawn from neighboring grid boxes with statistically equal means and statistically significant temporal correlations. We model precipitation extremes with the Generalized Extreme Value (GEV) distribution fits to time series of annual maximum precipitation. Non-stationarity of extremes is captured by including a time-dependent parameter in the GEV distribution. Our analysis reveals that the high-resolution model substantially improves the simulation of stationary precipitation extreme statistics particularly over the Northwest Pacific coastal region and the Southeast US. Observational data exhibits significant non-stationary behavior of extremes only over some parts of the Western US, with declining trends in the extremes. While the high-resolution simulations improve upon the low resolution model in simulating this non-stationary behavior, the trends are statistically significant only over some of those regions.

Early calculations of the iron-based superconductors based on a spin fluctuation model of pairing had great success in predicting the superconducting ground state and the qualitative systematics of its variation with doping, etc. Recent proposals, however, have argued that these treatments have neglected the true symmetry of the crystalline layer containing Fe, which has pnictogen and chalcogen atoms in buckled positions, providing a strong potential on the electrons in the Fe plane and enforcing a unit cell with 2 Fe atoms. Several recent phenomenological treatments of the implications of this symmetry for pairing have argued that this aspect had been missed in the earlier 1-Fe unit cell calculations and that this potential can force a completely different electronic ground state, where so-called eta-pairing states with non-zero total momentum and exotic properties such as odd parity spin singlet symmetry and possible time reversal symmetry breaking contribute to the superconducting condensate. This work uses concrete and realistic microscopic calculations for 2-Fe and 1-Fe models to demonstrate that the earlier 1-Fe calculations correctly accounted for this glide-plane symmetry and correctly predicted its implications on the observable superconducting gap. It furthermore shows that eta-pairing naturally arises in systems where both orbitals with even and orbitals with odd mirror reflection symmetry in z contribute to the Fermi surface states. In contrast to the recent proposals, however, this study finds that eta-pairing contributes with the usual even parity symmetry and that time reversal symmetry is not broken. This work has established a clear framework for the study of such questions in other unconventional superconductors, where similar questions have also arisen.

ICEE addresses the challenges of building a remote data analysis framework, motivated from real-world scientific applications. ICEE is designed to support data stream processing for near real-time remote analysis over wide-area networks. This solution is based on in-memory stream data processing in which we can reduce the time-to-solution compared with conventional batch-based processing.
Work was performed by ORNL, LBNL, PPPL, Georgia Tech, Rutgers University, KISTI (Korea), A*STAR Computational Resource Centre (Singapore), and the ICEE SciDAC project for the wide-area-network movement.

Memory scalability is an enduring problem and bottleneck that plagues many parallel codes. Parallel codes designed for High Performance Systems are typically designed over the span of several, and in some instances 10+, years. As a result, optimization practices, which were appropriate for earlier systems, may no longer be valid and thus require careful optimization consideration. Specifically, parallel codes whose memory footprint is a function of their scalability must be carefully considered for future exa-scale systems.

In this work we present a methodology and tool to study the memory scalability of parallel codes. Using our methodology we evaluate an application's memory footprint as a function of scalability, which we coined memory efficiency, and describe our results. In particular, using our in-house tools we can pinpoint the specific application components, which contribute to the application's overall memory foot-print (application data- structures, libraries, etc.).

An ensemble of simulations covering the present day observational period using forced sea surface temperatures and prescribed sea-ice extent is configured with a spectral transform dynamical core (T85) within the Community Atmosphere Model (CAM), version 4, and is evaluated relative to observed and model derived datasets. The spectral option is well-known and its relative computational efficiency for smaller computing platforms allows it to be extended to perform high-resolution climate length simulations.

The simulation quality is equivalent to the standard one-degree finite volume dynamical core. The spectral core, which is computationally efficient for smaller computing platforms, is shown to be a viable option for CAM and fully coupled Community Earth System Model simulations.

Work was performed at ORNL, and used OLCF as part of the DOE BER Ultra-High Resolution project

The rational management of oil and gas reservoirs requires understanding of their response to existing and planned schemes of exploitation and operation. Such understanding requires analyzing and quantifying the influence of the subsurface uncertainty on predictions of oil and gas production. As the subsurface properties are typically heterogeneous causing a large number of model parameters, the dimension independent Monte Carlo (MC) method is usually used for uncertainty quantification (UQ).
By using a multilevel Monte Carlo (MLMC) method to improve computational efficiency in uncertainty quantification for high dimensional problems, researchers can significantly reduce the computational cost, which helps decision-makers make economic and management decisions in a reasonable time.

This work is supported by LDRD program of Oak Ridge National Laboratory.

Researchers advanced the understanding and provided results that map out a possible way to successfully manipulating self-assembly of functional materials that can deliver improved energy transport, conversion and storage properties.

This control of interfacial and nanostructure (orientation and bonding) impacts a broad range of present and future energy materials, including organic photovoltaics, energy storage, fuel cells, membranes for CO2 capture, gas and water purification, and stronger light-weight materials that can result in energy savings. Additionally, the study of this area of nanoscience continues to provide advances for the design of improved materials for electronic devices and sensors as well as to address future challenges in integrating materials with exceptional properties into other application areas.

This work was performed at the Oak Ridge National Laboratory, the Center for Nanophase Material Science and the Oak Ridge Leadership Computing Facility.

The existence of multiple thermodynamically stable isomer states is one of the most fundamental properties of small clusters. This work shows that the conformational dependence of the Coulomb charging energy of a nanocluster leads to a giant electroresistance, where charging induced conformational distortion changes the blockade voltage. The intricate interplay between charging and conformation change is demonstrated in a Zn3O4 nanocluster by combining a first-principles calculation with a temperature-dependent transport model. The predicted hysteretic Coulomb blockade staircase in the current−voltage curve adds another dimension to the rich phenomena of tunneling electroresistance. The new mechanism provides a better controlled and repeatable platform to study conformational electroresistance.

Work was performed at University of Florida and the Center for Nanophase Materials Sciences, Materials Sciences and Engineering Division. This work was supported by the US Department of Energy (DOE), Office of Basic Energy Sciences (BES), under Contract No. DE-FG02-02ER45995. A portion of this research was conducted at the Center for Nanophase Materials Sciences, which is sponsored at Oak Ridge National Laboratory by the Division of Scientific User Facilities (X.-G.Z.). The computation was done using the utilities of the National Energy Research Scientific Computing Center (NERSC).

Researchers demonstrated coherent beam combination from an array of high-power broad area laser diodes with almost perfect beam quality (1.5-1.6 diffraction limit) and 95-99% visibility, high power conversion efficiency (in the range of 20%), and narrow line-width (0.1 nm).

Free-running broad area laser diodes support multiple spatial modes and subsequently emit very poor beam quality. Demonstration of almost perfect phase synchronization and coherence from commercial quality, non-identical, broad area array is of fundamental importance to our understanding of how large, heterogeneous, multiple spatial mode cavities phase-synchronize. This demonstration has direct impact on variety of applications including beam combining of high power lasers for (a) directed energy, (b) laser communication source, and (c) laser pump of fiber and solid state lasers.

Unresolved sub-grid processes, those that are too small or dissipate too quickly to be captured within a model's spatial resolution, are not adequately parameterized by conventional numerical climate models. Sub-grid heterogeneity is lost in parameterizations that quantify only the 'bulk effect' of sub-grid dynamics on the resolved scales. A unique solution, one unreliant on increased grid resolution, is the employment of stochastic parameterization of the sub-grid to reintroduce variability. The researchers administered this approach in a coupled land-atmosphere model, one that combines the single-column Community Atmosphere Model (CAM-SC) and the single-point Community Land Model (CLM-SP), by incorporating a stochastic representation of sub-grid latent heat flux to force the distribution of precipitation. Sub-grid differences in surface latent heat flux arise from the mosaic of Plant Functional Types (PFT) that describe terrestrial land cover. With the introduction of a stochastic parameterization framework to affect the distribution of sub-grid PFT's, the researchers alter the distribution of convective precipitation over regions with high PFT variability. The stochastically forced precipitation probability density functions (pdf) show lengthened tails, demonstrating the retrieval of rare events. Through model data analysis they show that the stochastic model increases both the frequency and intensity of rare events in comparison to conventional deterministic parameterization.

Coupled land-atmosphere climate calculations were run using Oak Ridge Leadership Computing Facility's (OLCF's) Titan supercomputer. Funding for this work was provided by the US Department of Energy through ORNL's LDRD program.

Understanding How to Manipulate the Nanoscale Assembly of Organic Photovoltaic Donor/Acceptor Films

M. Shao, J. K. Keum, R. Kumar, J. Chen

Researchers reported the results of a comprehensive investigation of the effects of the processing additive diiodooctane (DIO) on the morphology of the established blend of PBDTTT-C-T polymer and the fullerene derivative PC71BM used in organic photovoltaics (OPVs), starting from the casting solution and tracing the effects to spun-cast thin films by using neutron/x-ray scattering, neutron reflectometry and other characterization techniques corroborated by theory and modeling. The results reveal that DIO has no observable effect on the structures of PBDTTT-C-T and PC71BM in solution, however in the spun-cast films, it significantly promotes their molecular ordering and phase segregation, resulting in improved power conversion efficiency (PCE). Thermodynamic analysis based on Flory-Huggins theory provides a rationale for the effects of DIO on different characteristics of phase segregation due to changes in concentration resulting from evaporation of the solvent and additive during film formation. In summary, a comprehensive suite of characterization techniques and theoretical analyses revealed both the lateral and vertical morphological effects of the processing additive diiodooctane, DIO, on the formation of bulk-heterojunctions and the resulting organic photovoltaic device parameters starting from a donor/acceptor polymer blend PBDTTT-C-T:PC71BM in solution, to the spin-cast films.

This research was conducted at the Center for Nanophase Materials Sciences (CNMS), High flux Isotope Reactor (HFIR) and Spallation Neutron Source (SNS) at Oak Ridge National Laboratory. This research was conducted at the Center for Nanophase Materials Sciences (CNMS), High flux Isotope Reactor (HFIR) and Spallation Neutron Source (SNS) that are sponsored at Oak Ridge National Laboratory by the Scientific User Facilities Division, U.S. Department of Energy. KX and DBG acknowledge the support provided by a Laboratory Directed Research and Development award from the Oak Ridge National Laboratory (ORNL). M. Shao and J. K. Keum contributed equally for this work.

A Computer Program for Uncertainty Analysis Integrating Regression and Bayesian Methods

D. Lu, M. Ye, M. C. Hill, E. P. Poeter, and G. P. Curtis

This work develops a new functionality in UCODE_2014 to evaluate Bayesian credible intervals using the Markov Chain Monte Carlo (MCMC) method. The MCMC capability in UCODE_2014 is based on the FORTRAN version of the differential evolution adaptive Metropolis (DREAM) algorithm, which estimates the posterior probability density function of model parameters in high dimensional and multimodal sampling problems. This paper tests and demonstrates the MCMC capability using a 10-dimensional multimodal mathematical function, a 100-dimensional Gaussian function, and a groundwater reactive transport model. The use of the MCMC capability is made straightforward and flexible. With the new MCMC capability, UCODE_2014 can be used to calculate three types of uncertainty intervals, linear confidence intervals, nonlinear confidence intervals, and MCMC Bayesian credible intervals. Ready access allows users to select methods best suited to their work, and to compare methods in many circumstances.

Researchers developed an understanding of the role of conformational asymmetry in self-assembly of ordered multi-block copolymer morphologies. This understanding enhances their ability to effectively utilize self-assembly to generate nanoscale structures (morphologies) over large 3D volumes that are important for improving multiscale functional materials.

This work was performed at the Oak Ridge National Laboratory, the Center for Nanophase Material Science and the Oak Ridge Leadership Computing Facility.

Drawing inspiration from recent work in multilevel Monte Carlo methods, this work proposed a multilevel stochastic collocation method, based on a hierarchy of spatial and stochastic approximations. A detailed computational cost analysis showed, in all cases, a sufficient improvement in costs compared to single-level methods. Furthermore, this work provided a framework for the analysis of a multilevel version of any method for SPDEs in which the spatial and stochastic degrees of freedom are decoupled. The numerical results practically demonstrated this significant decrease in complexity versus single level methods for each of the problems considered. Likewise, the results for the model problem showed multilevel SC to be superior to multilevel MC even up to N = 20 dimensions (see right).

This work is sponsored by the Department of Energy's Advanced Scientific Computing Research program.

The Eclipse Integrated Computational Environment and its Readiness for High Performance Computing

Jay Jay Billings

The Eclipse Integrated Computational Environment (ICE) provides a common, extensible platform that improves productivity and streamlines the workflow for computational scientists. It has successfully integrated tools and simulation suites from across the DOE complex into a single, unified, cross-platform workbench. It works well on everything from standalone machines to large clusters, including running and managing remote parallel jobs as well as connecting to remote visualization engines and retrieving remote data. Recent work is extending the platform to work on Titan and other Leadership-class resources where launching jobs in the queue and rendering large scale visualizations are a priority.

ICE enhances the productivity of computational scientists by streamlining their workflow. It automates many difficult and tedious tasks and encapsulates confusing details. It increases the accessibility of sophisticated modeling and simulation tools and high-performance computer systems for those with limited experience in such environments.

The development of ICE has been sponsored by DOE Office of Nuclear Energy, Advanced Modeling and Simulation (NEAMS) Program and the DOE Office of Energy Efficiency and Renewable Energy, Computer-Aided Engineering for Batteries (CAEBAT) project.

A Web-based Visual Analytic System for Understanding the Structure of Community Land Model

This work was performed by Oak Ridge National Laboratory, Climate Change Science Institute, and Oak Ridge Leadership Computing Facility.

Publication: Y. Xu, D. Wang, T. Janjusic, and X. Xu, "A Web-based Visual Analytic System for Understanding the Structure of Community Land Model", The 2014 International Conference on Software Engineering Research and Practice, July 21, 2014, Las Vegas, Nevada.

Researchers introduced a static technique to characterize a program using the pattern-driven system HERCULES. This characterization technique not only helps a user to understand programs by searching pattern-of-interests, but also can be used for a predictive model that effectively selects the proper compiler optimizations. They formulated 35 loop patterns, then evaluated their characterization technique by comparing the predictive models constructed using HERCULES to three other state-of-the-art characterization methods.

The researchers showed that their models outperform three state-of-the-art program characterization techniques on two multicore systems in selecting the best optimization combination from a given loop transformation space. The researchers achieved up to 67% of the best possible speedup achievable with the optimization search space they evaluated.

Together with Max Gunzburger (Florida State University), Clayton Webster and Guannan Zhang published a review article in the premiere applied mathematics book series ACTA Numerica, Cambridge University Press, Volume 23, pp. 521650, 2014. The review article is entitled "Stochastic finite element methods for partial differential equations with random input data." Acta Numerica is the top-cited mathematics journal for the last two years in MathSciNet. It was established in 1992 to publish widely accessible summaries of recent advances in the field. Its annual volume of review articles (5-8) is by "invitation-only" and includes survey papers by leading researchers in numerical analysis and scientific computing. The papers present overviews of recent advances and provide state-of-the-art techniques and analysis. Covering the breadth of numerical analysis, articles are written in a style accessible to researchers at all levels and can serve as advanced teaching aids.

This work is sponsored by the Department of Energy's Advanced Scientific Computing Research program.

Enzymes are great biocatalysts, and have attracted significant interest for industrial applications (including cellulosic ethanol) due to their remarkable catalytic efficiencies. The understanding of factors that enable enzymes to achieve the high catalytic efficiency will have large impact through design of new and powerful biocatalysts. Unfortunately, the understanding of these factors have largely remain a mystery so far. Using joint computational-experimental methodology we have developed a unique technique named quasi-anharmonic analysis (QAA) for identification of conformational diversity and conformational sub-states associated with enzyme function. As a result of this approach we have been able to develop a novel enzyme engineering approach that shows ~3000% increase in enzyme activity.

Tracking a Value's Influence on Later Computation

ORNL Team Member: Phil Roth

Understanding how a program behaves is important for effective program development, debugging, and optimization, but obtaining the necessary level of understanding is usually a challenging problem. One facet of this problem is to understand how a value (the content of a variable at a particular moment in time) influences other values as the program runs.

To help developers understand value influence for their programs, we are developing a tool that allows a user to tag a value as being of interest, and then track the influence of that value as it, or values that were derived from it, are used in later computation, communication, and I/O. We believe that understanding how a value's influence propagates will enable algorithm designers to more easily identify optimizations such as the removal of unnecessary computation and communication. Our tool supports tracking value influence in multithreaded programs, and we are currently implementing support for applications that use MPI two-sided and collective operations for communication and synchronization.

This work is being done as part of the Institute for Sustained Performance, Energy, and Resilience (SUPER) project of the DOE Office of Science's Scientific Discovery through Advanced Computing program.

CASL Achievements

The Consortium for Advanced Simulation of Light Water Reactors (CASL) was established as the first U.S. Department of Energy (DOE) Innovation Hub, and was created to accelerate the application of advanced modeling and simulation (M&S) to the analysis of nuclear reactors [1]. CASL started operations on July 1, 2010, at its Oak Ridge National Laboratory (ORNL) headquarters in collaboration with ten core partners (ORNL, Idaho National Laboratory (INL), Los Alamos National Laboratory, Sandia National Laboratories, Electric Power Research Institute, Westinghouse Electric Company (WEC), Tennessee Valley Authority (TVA), North Carolina State University, University of Michigan, and Massachusetts Institute of Technology). CASL applies existing M&S capabilities and develops advanced capabilities to create a usable environment for predictive simulation of light water reactors (LWRs). This environment, known as the Virtual Environment for Reactor Applications (VERA), incorporates science-based models, state-of-the-art numerical methods, modern computational science and engineering practices, and uncertainty quantification (UQ) and validation against data from operating pressurized water reactors (PWRs), single-effect experiments, and integral tests.

CSMD staff have been key contributors to CASL since its inception, with Computational Engineering and Energy Science (CEES) Group Leader John Turner serving as Lead of the Virtual Reactor Integration (VRI) Focus Area and staff from the CEES, Computer Science, and Applied Math groups contributing to CASL goals.

Some recent CASL achievements of note include:

June 30, 2013: Deployment of VERA to Westinghouse Nuclear as CASL's first Test Stand. Ross Bartlett (CEES) was recognized for his efforts to ensure the success of this deployment with a Significant Event Award.

As CASL begins to transition from development to deployment of technologies, the VRI Focus Area was renamed Physics Integration (PHI), with Jess Gehin (ORNL/NSTD) as Focus Area Lead. Dr. Gehin was formerly Lead for the Advanced Modeling Applications (AMA) Focus Area. Dr. Turner assumed the role of Chief Computational Scientist for CASL.

Core-collapse supernovae are nature's mechanism for producing the elements heavier than oxygen, which make up our bodies and the world around us. By designing and implementing multi-dimensional, multi-physics codes, like CHIMERA, scientists can leverage the power of supercomputing platforms to explore this phenomenon. CHIMERA's complexity, pace of ongoing development and widely distributed collaborators present many challenges. In order to address those challenges, a multi-tier software system, Bellerophon, has been implemented to meet the verification, data analysis and management needs of CHIMERA's code development team. Since its initial release in April 2010, Bellerophon has enabled the CHIMERA team to discover and isolate multiple bugs and other subtle issues. These capabilities have provided a direct impact on simulation results and facilitated several publications.

Bellerophon's multi-tier architecture not only includes the commonly employed presentation, logic and data tiers, but also a "supercomputing tier" which has been ported to several platforms including Titan, Hopper and Kraken. This new tier is responsible for reducing and transmitting simulation data, conducting regression tests and archiving data. Bellerophon's data tier consists of a MySQL database and a hierarchical flat file database, which has grown to consist of over 20,000 Silo files, 300,000 PNGs, 60 regularly updated MP4 movies and the results of 2200 regression tests. The logic tier is comprised of automated data processors and the Bellerophon PHP web service, which is used by the client to access the database and other backend programs. The primary aim of Bellerophon is to provide a "one-stop shop" for code development and scientific workflow management via a central, easy-to-use, web-accessible portal while smoothly integrating with other workflow tools. Deployed as a cross-platform Java application, this portal utilizes both the dashboard and WYSIWYG philosophies of intuitive UI design which enables users to quickly and easily access multiple capabilities with little or no training.

Bellerophon's near real-time data analysis and visualization capabilities provide the CHIMERA team direct insight into the production runs. Once a run is bound to Bellerophon's "data_generator" program, users can view customized renderings of simulation results with a latency as low as five minutes. Introduced in the 1.1 release, the data_generator checks for new output regularly and executes a C program, chimera2silo, which converts CHIMERA output to the Silo format. Once converted, the files are automatically transmitted to the web server. The visualization processor on the server generates animated 2D colormaps of over a dozen quantities using VisIt. These animations are stored as a series of PNGs and are automatically converted into QuickTime-compatible MP4 movies. Animation types can be added, removed and highly customized using the Visualization Set Manager client-side tool. Another client-side tool, the Visualization Set Explorer, allows users to play, fast-forward and rewind animations, download an animation's PNG images and associated MP4 movie and generate custom data views on the fly.

Significant contributions from Bellerophon collaborators Sharvari Desai (UT), Chastity Holt (ASU) and Eric Lentz (UT) led to the release of Bellerophon 1.2 in September 2013. With this release, a new data analysis program, Chimeralyzer, was connected to the data_generator to calculate time-sequenced analysis of CHIMERA output. In addition, new developments in the logic tier of Bellerophon 1.2 integrated the Grace plotting tool and the ImageMagick library into the visualization processor. This new expansion allows Bellerophon to generate and deliver static and animated 2D line plots in near real-time of important quantities describing the simulation such as explosion energy and shock radius.

Bellerophon also addresses verification and workflow management needs. Verification tasks are performed by an automated regression test system, which spans all components of Bellerophon's architecture beginning with the supercomputing tier, where programs checkout, compile and execute the latest revision of CHIMERA. These results are transmitted to the web server where they are processed and then made available through Bellerophon's Regression Test Explorer tool. The SVN Statistics On-Demand tool allows users to execute the StatSVN code repository statistics generator via Bellerophon's web service over a custom date and/or revision range. The resulting output is a downloadable set of interlinking HTML pages with tables and plots detailing statistical information about a project's code development. The Important Links and Information client-side feature provides direct links to CHIMERA's online Trac repository browser, its public and private wikis and the OLCF, NICS and NERSC home pages. With this tool, a user can also post to the CHIMERA mailing list, browse the mailing list archives, create a new Trac ticket and receive real time status updates of OLCF, NICS and NERSC resources.

Bellerophon has been funded by the DOE's Office of Science under the Advanced Scientific Computing Research (ASCR) program and the National Center for Computational Sciences.

The high performance computing (HPC) community is working to address concerns associated with fault tolerance and resilience in current and future large scale computing platforms. This is driving enhancements in the programming environments, specifically research on enhancing message passing interface (MPI) to support fault tolerant computing capabilities. As these enhancements emerge, tools for resilience experimentation are becoming more important. In the workshop paper titled, "Using Performance Tools to Support Experiments in HPC Resilience", we consider how HPC performance-focused tools and methods can be extended ("repurposed") to benefit the resilience community.

The paper describes the initial motivation to leverage standard HPC performance analysis techniques to aid in developing diagnostic tools to assist fault tolerance experiments for HPC applications. These diagnosis procedures help to provide context for the system when the errors (failures) occurred. We describe the extension of an existing MPI tracing package (i.e., DUMPI) to support the User Level Failure Mitigation (ULFM) specification that has been proposed to the MPI Forum by the Fault Tolerance Working Group (MPI-FTWG). The data obtained from these traces can assist application developers and FT implementers for diagnosing problems and help with postmortem analysis. To investigate the usefulness of the trace tool we extended a simple molecular dynamics application to use the ULFM enhancements to MPI. Our initial experiments used the trace files from the tests to help gain insights into the context of the job during resilience experiments. The traces helped to highlight two problems we encountered during fault injection experiments: i) a fault-injection logic error that resulted in correct results (application output), but more ranks than anticipated being killed; ii) an issue in failure detection/propagation with the ULFM prototype that was effected by the method used to simulate the rank failure. The trace files also help to explain changes to overall performance when MPI fault tolerance mechanisms are employed.

With high performance systems exploiting multicore and accelerator-based architectures on a distributed shared memory system, heterogenous hybrid programming models are the natural choice to ex- ploit all the hardware made available on these systems. Previous efforts looking into hybrid models have primarily focused on using OpenMP directives (for shared memory programming) with MPI (for inter-node programming on a cluster), using OpenMP to spawn threads on a node and communication libraries like MPI to communicate across nodes. As accelerators get added into the mix, and there is better hardware support for PGAS languages/APIs, this means that new and unexplored heterogenous hybrid models will be needed to effectively leverage the new hardware. In this paper we explore the use of OpenACC directives to program GPUs and the use of OpenSHMEM, a PGAS library for one- sided communication between nodes. We use the NAS-BT Multi-zone benchmark that was converted to use the OpenSHMEM library API for network communication between nodes and OpenACC to exploit accelerators that are present within a node. We evaluate the performance of the benchmark and discuss our experiences during the development of the OpenSHMEM+OpenACC hybrid program.

This work proposed a design for implementing blocking and non-blocking reduction collective operations for modern multicore systems. An implementation based on the design performed an order of magnitude better than the state of-the-art on variety of systems including Cray and InfiniBand systems. The Conjugate Gradient solver using this implementation completed over 195% faster, compared to the completion time while using the state-of-the-art. These reduction implementations are integrated into Open MPI, a popular implementation of MPI standard, and we expect to release these implementations publicly as part of future Open MPI release. A paper describing the design, implementation, and evaluation of these reductions is accepted to be published in IEEE Cluster 2013 conference proceedings.

The path to exascale high-performance computing (HPC) poses several challenges related to power, performance, resilience, productivity, programmability, data movement, and data management. Investigating the performance of parallel applications at scale on future architectures and the performance impact of different architecture choices is an important component of HPC hardware/software co-design. Simulations using models of future HPC systems and communication traces from applications running on existing HPC systems can offer an insight into the performance of future architectures. This work targets technology developed for scalable application tracing of communication events and memory profiles, but can be extended to other areas, such as I/O, control flow, and data flow. It further focuses on extreme-scale simulation of millions of Message Passing Interface (MPI) ranks using a lightweight parallel discrete event simulation (PDES) toolkit for performance evaluation. Instead of simply replaying a trace within a simulation, the approach is to generate a benchmark from it and to run this benchmark within a simulation using models to reflect the performance characteristics of future-generation HPC systems. This provides a number of benefits, such as eliminating the data intensive trace replay and enabling simulations at different scales. A recent accomplishment utilizes the ScalaTrace tool from North Carolina State University (NCSU) to generate scalable trace files, the ScalaBenchGen tool from NCSU to generate the benchmark, and ORNL's extreme-scale simulator, xSim, tool to run the benchmark within a simulation. Early results show that Execution times of both the original application and the generated benchmarks are similar for the FT benchmark from the NAS parallel benchmark suite. The mean error rate is only 5% for FT, while a rate of 20% is observed for the IS benchmark from the NAS parallel benchmark suite. Ongoing work focuses on handling more benchmarks during the generation process and on improving accuracy.

The IBOBSU project - Isotope Business Office Business Systems Upgrade project - has recently been funded as an 18-month project by the DOE's Office of Nuclear Physics under the Isotope Development and Production for Research and Applications (IDPRA) program. The project is led by Mitch Ferren from the National Isotope Development Center Isotope Business Office. The other investigators are Eric Lingerfelt from the Computer Science Research Group in the Computer Science and Mathematics Division, Patricia Winter and Donna Ault from the Business Services Directorate, Monty Middlebrook and Michael Smith from the Physics Division, and Russell Langley and Justin Rogers from the Information Technology Services Division.

The National Isotope Development Center (NIDC) is the sole government source of stable and radio-isotope products for science, medicine, security, and applications. The NIDC manages the sales and distribution of these isotopes through the Isotope Business Office (IBO), which is located at ORNL. There are currently three major components to the IBO's business software architecture. The first component is the NIDC website, isotopes.gov, which is a custom-built, database-driven website using PHP and MySQL. The website is highly integrated with the Online Management Toolkit (OMT). The OMT is a web-deliverable, cross-platform Java application providing NIDC staff the capability to quickly and easily modify the Online Catalog of Isotope Products, harvest website statistics, dynamically generate monthly reports, and administer OMT user accounts from any location. The second component is the Isotope Reference Information System (IRIS). IRIS is an archaic standalone application built with Visual FoxPro that allows NIDC staff to store and access information concerning quotes, orders, inventory, sales, shipments and invoices as well as dynamically generate custom reports. The final component is ORNL's SAP Enterprise Business System. The IBOBSU project's primary goal is the reduction of these three independent systems to two systems via two concurrent tasks. Task 1: Business System Conversion (led by Winter) will streamline the IBO's current workflow by retiring IRIS and integrating its capabilities and database into the existing ORNL SAP system through customization of SAP modules according to process requirements. Task 2: NIDC Website Enhancements (led by Lingerfelt) focuses on the website and OMT, specifically on delivering a secure online quotation and ordering capability of stable isotopes for preferred customers, an online shopping cart feature for quote requests and preferred customers' quotes and orders, and new OMT software tools for the management of quotes, orders, and preferred customer accounts. A new preferred customer portal at isotopes.gov will also be developed allowing preferred customers the ability to securely monitor the status of orders, browse their quote and order history, and manage their profile.

Simulating Element Creation in Supernovae with the Computational Infrastructure for Nuclear Astrophysics at nucastrodata.org

The elements, which make up our bodies and the world around us, are produced in violent stellar explosions. Computational simulations of the element creation processes occurring in these cataclysmic phenomena are complex calculations that track the abundances of thousands of species of subatomic nuclei that are interrelated by ~60,000 thermonuclear reaction rates stored in continually updated databases. Previously, delays of up to a decade were experienced before the latest experimental reaction rates were used in astrophysical simulations. The Computational Infrastructure for Nuclear Astrophysics (CINA), freely available at the website nucastrodata.org, reduces this delay from years to minutes! With over 100 unique software tools developed over the last decade, CINA comprises a "lab-to-star" connection. It is the only cloud computing software system in this field and it is accessible via an easy-to-use, web-deliverable, cross-platform Java application. Currently, CINA has registered users from over 32 countries and 141 institutions with new users added every week.

CINA's capabilities cover the broad spectrum required to quickly process an experimental measurement from an accelerator facility, for example, and analyze its affect on a variety of stellar explosive events. Users have the ability to convert experimental and theoretical cross sections and astrophysical S-factors into parameterized thermonuclear reaction rates, merge these rates into standard rate libraries, and use these custom libraries (along with other input data) to execute and analyze post-processed, nucleosynthesis simulations of main sequence stars, novae, and X-ray bursts. A new feature developed and released in 2013, CINA users can now execute simulations of the rapid neutron capture process (r-process) as it occurs in core-collapse supernovae using a network of over 4500 tracked isotopes and 51,000 interconnecting thermonuclear reaction rates. In addition, a new visualization tool was developed for the analysis of r-process results that enable the generation of customizable 1D plots of final abundance vs. atomic number, neutron number, and mass number. This new plotter is the newest within CINA's expansive collection of tailored visualization tools. These tools allow users to robustly visualize, analyze, and compare simulation results with a variety of specialized 1D plots and animated, interactive 2D plots on a nuclide chart. The results can be exported in a variety of image formats or as AVI movies. Other tools within CINA provide the utility to upload, manipulate, analyze, and visualize nuclear data sets, thermonuclear reaction rates, and collections of reaction rates. Users can also share nuclear data sets, custom rate libraries and element synthesis simulation results with colleagues in an online community.

CINA also offers other ancillary software tools. The Data Harvester, for instance, permits the user the ability to select an isotope and dynamically download a collection of related nuclear data and publications from a number of online databases. The File Repository tool facilitates communication between colleagues by allowing users to create folders and share files concerning nuclear astrophysics research. Finally, a suite of "role-centric" tools for reaction rate evaluations is also available. When new reaction rates require evaluation, scientific contributors, evaluators, referees, and editors are required throughout the process. CINA offers a set of tools suitable for each role as well as tools to interactively monitor the status and access files related the rate evaluation process.

In addition to the capabilities offered by CINA, the nucastrodata.org website provides the scientific community a comprehensive list of continuously updated, categorized links to all online nuclear astrophysics datasets including reaction rate collections (experimental, theoretical, combined, weak), S-factors, cross sections (experimental, evaluated, theoretical), plots, nuclear structure, nuclide charts, software, and bibliographic information. It also provides sample investigations using CINA that are appropriate for high school, college, and graduate school students and offers access to files shared via CINA's File Repository tool.

This work is funded by the DOE's Office of Nuclear Physics under the US Nuclear Data Program.

A first-principles theoretical study of electric field- and strain-controlled intrinsic half-metallic properties of zigzagged aluminium nitride (AlN) nanoribbons reveal that the half-metallic property of AlN ribbons can undergo a metallic or semiconducting transition with application of an electric field or uniaxial strain. An external transverse electric field induces full charge screening rendering the material semiconducting, while an uniaxial strain varying from compressive to tensile causes the spin-resolved selective self-doping to increase the half-metallic character of the ribbons. The relevant strain-induced changes in electronic properties arise from band structure modifications at the Fermi level as a consequence of a spin-polarized charge transfer between p-orbitals of the N and Al edge atoms in a spin-resolved self-doping process. The band structure tunability indicates the possibility of rationally designing magnetic nanoribbons with tunable electronic structure by deriving edge states from elements with sufficiently different localization properties.

Scientific Achievement

Demonstrated how and why the half-metallic property of aluminum nitride (AlN) nanoribbons can undergo a transition to fully-metallic or semiconducting behavior with application of an electric field or uniaxial strain.

Significance and Impact

The band structure tunability of AlN indicates the possibility of rationally designing magnetic nanoribbons with "on-demand" electronic structure. An external transverse electric field induces a full charge screening that renders the material semiconducting, while an uniaxial strain varying from compressive to tensile causes the spin-resolved selective self-doping to increase the half-metallic character of the ribbons.

This research was conducted at the Center for Nanophase Materials Sciences (CNMS), which is sponsored at Oak Ridge National Laboratory by the Scientific User Facilities Division, Office of Basic Energy Sciences, U. S. Department of Energy. The work at NCSU was supported by DOE grant DE-FG02-98ER45685. The computations were performed using the resources of the CNMS and the Oak Ridge Leadership Computing Facility at Oak Ridge National Laboratory.

CSMD, CSED, and MSSED researchers have developed a new technology for the
technical verification of non-proliferation treaties. With support from
the Defense Threat Reduction Agency (DTRA), the CSMD-led team has deployed
quantum information concepts to detect when an intruder tampers with a
sealed targets. This is a particular concern for treaty inspectors that
need to confirm that the containment and surveillance of a special nuclear
material has been uninterrupted. The novelty of the ORNL approach is to
use quantum entanglement, a quantum mechanical feature that describes how
two spatially disparate systems can exhibit seemingly strong correlations
in their behaviors. The ORNL team has now leveraged those effects
alongside the no-cloning principle to detect when an intruder is present,
thus closing a vulnerability in existing
tamper-indication technology.

Unlike classical correlations, which can be both observed and copied, the
non-local quantum correlations used by the ORNL team can not be copied and
any attempts at doing so destroys the entanglement. By monitoring for the
presence of entanglement, the team was able to demonstrate a high
probability of detection at a very low false alarm rate. In addition,
recent development have laid the ground work for making these quantum
verification measurements in real time using customized FPGA-based data
collection systems.

CSMD researches have developed a new paradigm for leveraging the benefits
of quantum communication. While quantum communication protocols like
teleportation, entanglement swapping, and QKD are exciting possibilities
for today's quantum communication engineers, these protocols are often
exotic and unfamiliar to the end user. In addition, protocol
implementation is often tightly coupled to the underlying physics, which
makes tuning or reconfiguring the communication system difficult and
costly.

The CSMD-led team has developed a paradigm for highly reconfigurable
quantum communication systems based on software-defined quantum
communication. Like its classical namesake, software-defined quantum
communication uses software layers to isolate functional concerns and
maintain the reusability and extensibility of prior implementation. The
team has recently demonstrated their methodology in the design of a
super-dense coding communication system, which is a highly efficient
quantum protocol for sending messages between users. The work was funded
by the Defense Threat Reduction Agency, which uses the software-defined
approach to design sensor and communication terminals.

Results from this work are highlighted in a news release from the optical
society SPIE and part of an invited presentation at Quantum Communications
and Quantum Imaging XI (August, 2013)
http://spie.org/x102597.xml

With the availability of first generation quantum computers, questions of
programming and benchmarking are at the forefront of quantum computer
science. A recent CMSD-led effort is helping address these questions by
developing an integrated development environment for quantum computing.
Funded by the Lockheed Martin Corporation, the ORNL team has developed
JADE, or the Jade Adiabatic Development Environment, for managing the
complexity of quantum programming. JADE allows users to specify input
problems, control instructions, and processor configurations to generate
run-time programs. JADE also interfaces with a computational backend that
enables simulation of the quantum programs. Such numerical simulations
provide detailed diagnostics about program behavior and computational
complexity that are needed for building next generation quantum
information systems.

CSMD and CNMS researchers Qing Li, Jonathan Owens, Chengbo Han, Bobby G. Sumpter, Wenchang Lu, Jerry Bernholc, Vincent Meunier, Petro Maksymovych, Miguel Fuentes-Cabrera, and Minghu Pan have demonstrated a non-thermal, electron-induced approach to the self-assembly of phenylacetylene molecules on gold that allows for a previously unachievable attachment of the molecules to the surface through the alkyne group and further controllable surface-coordinated linear polymerization of long-chain poly(phenylacetylenyl)s that are self-organized into a "circuit-board" pattern.

Self-assembly (the process in which a disordered system of pre-existing components forms an organized structure or pattern as a consequence of specific, local interactions among the components themselves, without external direction) is the key to bottom-up design of molecular devices, because the nearly atomic-level control is very difficult to realize in a top-down, e.g., lithographic approach. Self-assembly of the molecular chains previously achieved using thermally driven processes, as opposed to electronic control in this work, leads to defects and largely uncontrolled surface packing. To enable electronic devices, it is essential to control defects at the interface (the interface makes or breaks a device), which is one of the advantages of an all-electronic control over self-assembly and polymerization.

A knowledge of the masses of subatomic nuclei forms a crucial foundation for research in basic and applied nuclear science, as well as in astrophysics. New accelerator facilities and new detection systems have enabled researchers around the world to make more, and much more precise, nuclear mass measurements. In late 2012, a new Atomic Mass Evaluation was released, the first in a decade, that included all of this new information. However, the dissemination of these new masses (as an 850-page paper or as one enormous electronic table) has limited utility for researchers. Our Nuclear Masses Toolkit (NMT) provides the only online dissemination of these new masses whereby a ready comparison can be made with older masses and with the predictions of over 13 different theoretical mass models.

Freely available at the website nuclearmasses.org, the NMT enables research scientists and non-experts the ability to share, visualize and analyze nuclear mass information in a robust and intuitive way. Once registered, Users can upload mass datasets, store them, and share them with colleagues as well as compare mass datasets with ones from peer-reviewed research. The NMT's suite of visualization tools allows Users to quickly and easily create highly customized data views of several quantities and their differences when compared to other datasets. The data views not only include 1D plots but also interactive 2D plots on the chart of the nuclides that compare values such as mass excess, Q-values, and separation energies. Datasets may also be analyzed by average RMS diﬀerences and RMS difference as a function of atomic number, neutron number, and mass number. In addition to the capabilities offered by the NMT, the nuclearmasses.org website provides the scientific community detailed information and links concerning theoretical and experimental mass datasets. It also creates a mechanism for Users to inform the community of their latest mass models, measurements, and soXware tools.

This work was initially funded through an ORNL Seed Proposal in 2008 and is currently funded by the US Nuclear Data Program of the Office of Nuclear Physics in the U.S. Department of Energy Office of Science.

New Version of C3

May 9, 2013

A new version of the Cluster Command Control (C3) tools has been released. The C3 tools are used as a core piece of the OSCAR cluster management suite, which has been updated to support the latest Ubuntu Linux distribution. These updated cluster tools are used internally by members of Computer Science Research to maintain group machines. They are also used stand-alone by a variety of users from industry, academia and laboratories.

C3 is a suite of cluster tools developed at ORNL that are useful for both administration and application support. The suite includes tools for cluster-wide command execution, file distribution/gathering, process termination and with proper privileges remote node shutdown. The tools can be installed for general system use or run from a user's home directory. By default, the tools use SSH to connect to the remote machines and support a rich set of options for cluster or multi-cluster setup.

Open Source Cluster Application Resources (OSCAR) is a cluster installation and management suite that can be used to setup an system with common HPC software, e.g., job schedulers, message passing libraries, etc. The latest version of the suite was extended to support lab approved Linux and system management policies. This enhanced version of OSCAR was used to deploy two clusters maintained by CSR members: HAL9000 and SAL9000. These platforms are used for research and development of HPC system software and resilience research.

ORNL is a founding member of the OSCAR project and has maintained a leadership role in the development and maintenance of the suite over the past decade. The C3 tools were first developed at ORNL in 2000. The C3 project was started by Stephen Scott and Brian Luethke as the lead developers. In addition to community patches and bug reports, over the years several individuals have contributed to the development and maintenance of C3, to include: Mike Brim, John Mugler, Geoffroy Vallee, Wesley Bland, and Thomas Naughton.

Edge-Edge Interactions in Stacked Graphene Nanoplatelets

February 14, 2013

CSMD researcher Bobby Sumpter was part of a team whose work on graphene platelets was published in the American Chemical Society's ACSNano Journal.

Intrinsic stacking interactions of small graphene platelets cause modifications in the local environment of larger graphene plates. Ramifications include: limiting the epitaxial growth of a platelet or arresting the reconstruction of an edge during combined Joule heating and electron irradiation experiments.

The team used high-resolution transmission electron microscopy studies to show the dynamics of small graphene platelets on larger graphene layers. The platelets move nearly freely to eventually lock in at well-defined positions close to the edges of the larger underlying graphene sheet. While such movement is driven by a shallow potential energy surface described by an interplane interaction, the lock-in position occurs via edge-edge interactions of the platelet and the graphene surface located underneath. Here, the team quantitatively studied this behavior using van der Waals density functional calculations. Local interactions at the open edges are found to dictate stacking configurations that are different from Bernal (AB) stacking. These stacking configurations are known to be otherwise absent in edge-free two-dimensional graphene. The results explain the experimentally observed platelet dynamics and provide a detailed account of the new electronic properties of these combined systems.

Surface-Induced Orientation Control of CuPc Molecules for the Epitaxial Growth of Highly Ordered Organic Crystals on Graphene

Spring 2013

ORNL researchers were part of a team that showed how graphene is able to direct the assembly of copper phthalocyanine (CuPc) molecules into epitaxially-aligned superstructures relevant to organic electronics. Theoretical modeling of the mechanisms responsible for this alignment revealed that van der Waals interactions and interfacial dipole interactions induced by charge transfer both play important roles.

(left) Theoretical modeling of CuPc molecules interactions with graphene in both face-on and side-on orientations (right) STM image of CuPc molecules aligned in the face-on orientation on graphene. Bottom left inset is a higher magnification STM image, top right inset schematically shows the molecular orientation.

This work provides a fundamental understanding of molecular interactions at interfaces important to controlling the nanoscale morphology and orientation of organic semiconductors and to improving optoelectronic processes for high-performance organic electronic devices. Here, graphene is demonstrated to effectively template CuPc molecules to nucleate, orient, and pack in the face-on orientation, the ideal structure for high-performance organic photovoltaics.

Acknowledgement of Support:This research was conducted at the Center for Nanophase Materials Sciences, which is sponsored at Oak Ridge National Laboratory by the Office of Basic Energy Sciences, U.S. Department of Energy. KX, MY, and DBG acknowledge partial support provided by a LDRD(#6521). This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231

An Integrated Website and Software System for the National Isotope Development Center

A new database-driven website, isotopes.gov, has been designed and developed for the National Isotope Development Center (NIDC), which is the sole government source of stable and radio-isotope products for science, medicine, security, and applications. Since going live in May 2011, isotopes.gov has provided customers detailed information concerning NIDC activities, funding opportunities, jobs & training, meetings & workshops, outreach & education, production research, and the Isotope Business Office (IBO). The website also hosts the Online Isotope Product Catalog, which allows customers to interactively explore the catalog and submit requests for quotations and new products. In addition to the website, a suite of intra-office software tools, the NIDC Online Management Toolkit (OMT), has been released. The OMT provides NIDC staff the capability to quickly and easily modify the product catalog, harvest website statistics, dynamically generate monthly reports, and administer OMT user accounts.

Developed by Eric Lingerfelt (CSM) and Michael Smith (Physics), the website has successfully brought the NIDC’s web presence forefront and encouraged interaction with new and existing customers. It has also served to educate the public on the important role that isotopes play in society. During FY12, for example, the Online Isotope Product Catalog registered 45,000 unique isotope product hits and generated 710 quotation requests. isotopes.gov was also used in Fall 2012 to announce and respond to queries on the auction of 4,000 liters of purified He-3 gas worth $10 million. The OMT facilitates the NIDC’s objectives by providing a web-deliverable, cross-platform content management system for the Online Isotope Product Catalog and easy-to-use tools to dynamically generate monthly reports that are used by DOE to guide future isotope production decisions.

This work is funded by the Isotope Development and Production for Research and Applications (IDPRA) subprogram of the Office of Nuclear Physics in the U.S. Department of Energy Office of Science.

Scientific Achievement
Elucidated a growth mechanism for new hybrid "nanooysters" - encapsulated metal nanoparticles in hollow carbon shells - that were formed by transforming carbon nanocones with metal nanoparticles at elevated temperatures. Nanooysters can be readily produced and could promise new functional properties as encapsulated metallic quantum dots.

Significance
This work establishes a "nano-enabled materials design" approach, wherein theory, synthesis, and characterization are used to unravel the atomistic mechanisms driving the formation of new type of carbon-metal system.

Acknowledgement of Support:
This research was conducted in part at the Center for Nanophase Materials Sciences (CNMS), and at Oak Ridge National Laboratory's Shared Research Equipment (ShaRE) User Facility, both of which are sponsored by the Scientific User Facilities Division, DOE-BES. Experimental synthesis research supported by the U.S. Department of Energy (DOE), Basic Energy Sciences (BES), Materials Sciences and Engineering Division. The theoretical work was in part supported by a CREST (Core Research for Evolutional Science and Technology) grant in the Area of High Performance Computing for Multiscale and Multiphysics Phenomena from the Japanese Science and Technology Agency (JST). SI acknowledges support by the Program for Improvement of Research Environment for Young Researchers from MEXT of Japan. The computations were performed using the resources of the CNMS and the National Center for Computational Sciences at Oak Ridge National Laboratory.

Significance
This work establishes a “materials by design” approach, wherein theory, synthesis, and characterization are used to rationally design new materials for optoelectronic applications. The ability to manipulate end group compositions coupled with the propensity of pyridyl-functionalized P3HTs to ligate SQDs enables tuning the morphology of conjugated polymer/SQD blends to achieve improved hybrid photovoltaic materials.

(Above) Binding energies calculated using density functional theory (DFT)
were used to guide selection of pyridyl end groups for synthetic development.

Research Details
CNMS capability: Novel polymer synthesis and characterizations for the development of new soft materials.
Density functional theory calculations for guiding the selection of optimal end groups.
Pyridyl-functionalized P3HTs for ligands to decorate CdSe SQDs for stabilizing blend morphology.

This research was conducted at the Center for Nanophase Materials Sciences (CNMS), which is sponsored at Oak Ridge National Laboratory by the Scientific User Facilities Division, Office of Basic Energy Sciences, U. S. Department of Energy. The computations were performed using the resources of the CNMS and the National Center for Computational Sciences at Oak Ridge National Laboratory.

Chem. Mater. DOI: 10.1021/cm302915hr.

Signatures of Cooperative Effects and Transport Mechanisms in Conductance Histograms

M.G. Reuter, M.C. Hersam, T. Seideman, M.A. Ratner

Achievement
We develop a tractable model for simulating conductance histograms, which are a common form of reporting experimental data on electron transport processes in nanometer-scale systems (e.g. conductance through molecular wires). With this model, we can investigate the roles of the various physical parameters on data reported in a conductance histogram. For transport through a single wire, the histogram peak elucidates the mechanism of electron transport. The peak additionally reveals the relative frequencies of these mechanisms if more than one contributes to conduction. A histogram peak from multiple wires indicates the presence of cooperative effects (crosstalk) between the wires and also encodes information on the underlying conduction channels. Before this study, this information, albeit present in existing experimental data, was ignored.

Significance
This work helps build a bridge between theory/computation and experiment to better understand electron transport processes through nanometer-scale systems. Theory has generally focused on conductance through a molecule in a fixed configuration, whereas, in most cases, experiment cannot determine, let alone control, the molecular configuration. Our general model adds, for the first time, elements of stochasticity, which connects well-established theory with observable experimental data. We find that, in addition to an average molecular conductance, conductance histograms reveal transport mechanisms and cooperative effects. Finally, this work paves the way to infer theoretical parameters from experimental data.

Credit - This work was published in Nano Lett. This research was supported in part (i) by the Department of Energy Computational Science Graduate Fellowship Program, (ii) by the Eugene P. Wigner Fellowship Program, and (iii) at the Center for Nanophase Materials Sciences, which is sponsored at the Oak Ridge National Laboratory by the Scientific User Facilities Division, Office of Basic Energy Sciences, U.S. Department of Energy.

Electronic Control over Attachment and Self-Assembly of Alkyne Groups on Gold

Achievement
Demonstrated a non-thermal, electron-induced approach to the self-assembly of phenylacetylene molecules on gold that allows for a previously unachievable attachment of the molecules to the surface through the alkyne group. While thermal excitation can only desorb the parent molecule due to prohibitively high activation barriers for attachment reactions, localized injection of hot electrons or holes not only overcomes this barrier but also enables an unprecedented control over the size and shape of the self-assembly, defect structures, and the reverse process of molecular disassembly from a single molecule to a mesoscopic length scale.

Significance
Self-assembled monolayers are the basis for molecular nanodevices, flexible surface functionalization, and dip-pen nanolithography. Yet self-assembled monolayers are typically produced by a rather inefficient process that involves thermally driven attachment reactions of precursor molecules to a metal surface, followed by a slow and defect-prone molecular reorganization. The electron-induced excitation method demonstrated in this work may therefore enable new and highly controlled approaches to molecular self-assembly on a surface.

Credit - This work is published in ACS Nano. A portion of this Research at Oak Ridge National Laboratory's Center for Nanophase Materials Sciences was sponsored by the Scientific User Facilities Division, Office of Basic Energy Sciences, U.S. Department of Energy.

Researchers using supercomputers to glean insight into scientific problems can spend much of their time just trying to get massive amounts of data in and out of a machine. CSMD's Scientific Data Group works to minimize this process, and this week its members took a big step forward by releasing its newest update to the Adaptable Input/Output System (ADIOS), version 1.4, which allows users more time to focus on achieving scientific insight and less on managing data.

"Our goal is to take our research and put it in production," says Scott Klasky, group leader of the Scientific Data Group at ORNL. Klasky's group researches methods to streamline this process, called input/output (I/O), on supercomputers.
The group's project started in 2008, when Klasky and his team created ADIOS to help increase by ten-fold I/O on the Oak Ridge Leadership Computing Facility's(OLCF's) Jaguar supercomputer. Four updates and 65 journal publications later, Klasky's team is still looking for ways to make I/O even more efficient.

Version 1.4 represents a fundamental shift in the middleware that goes in tandem with shifts in supercomputing architectures. The Cray XT5 Jaguar, capable of 3.3 thousand trillion calculations per second, is being overhauled and transformed into a Cray XE6 dubbed Titan. The machine will be capable of 20 thousand trillion calculations per second by using a combination of traditional central processing units (CPUs) and fast and efficient graphics processing units (GPUs).
Click HERE to read the full article.

Two Open Source Benchmarks from the Extreme Scale System Center (ESSC)

The first Open Source release, SystemBurn, was designed to address the emergent emphasis on power demand engendered in the co-design of trans-petascale and exascale computing systems. SystemBurn assists scientists and engineers in examining the limits and trade-offs between power and throughput, as well as a tool to test systems and their environment. SystemBurn accomplishes this by providing a library of common computational algorithms and an execution framework to compose them into tests to simulate the behavior of real applications or to create maximal synthetic power loads permitting the researcher to establish an upper limits so that power and cooling maybe accurately provisioned. The library includes synthetic loads to simultaneously exercise the various components of a system, including the CPU, memory, accelerators, storage, and network, providing the user with the ability to measure the system at its electrical and thermal maximums. In addition to maximizing electrical usage, the variety of loads supplied with SystemBurn make it possible to gather power and thermal profiles associated with specific tasks to simulate real applications. As part of the Open Source release, SystemBurn has been enhanced to provide performance statistics in addition to the thermal information it already collects, allowing users to correlate throughput performance with power usage. Future enhancements to SystemBurn will enable auto-tuning of load selection, giving a user with little or no prior knowledge of the system a good basis for maximizing power draw.

The second Open Source release, SystemConfidence, addresses the increasing impact that the variability of small scale latencies can have on application scalability on extreme scale systems. SystemConfidence implements an execution framework, measurement tools, and analysis tools for studying the "real-world" latencies in HPC systems (currently, communication and I/O). Utilizing the Oak Ridge Benchmarking (ORB) Timer library, SystemConfidence can expose unexpected characteristics in networks through statistical analysis of operational latencies. SystemConfidence has identified application scalability impacts of system software upgrades, performance defects in commodity and custom interconnect products, and design improvements which were subsequently incorporated into two contemporary market offerings.
SystemBurn and SystemConfidence were developed as part of the ESSC partnership with the Department of Defense. Principal authors are Jeff Kuehn, Josh Lothian, and Steve Poole. The whole ESSC team contributed ideas; numerous summer students contributed code to the projects. Development can be tracked at https://github.com/jlothian/systemburn and https://github.com/jlothian/sysconfidence respectively.

Achievement
This work exemplifies collaborative nanoscience research between multiple institutions to pioneer the bulk synthesis of 3D macroscale nanotube elastic solids directly via a boron-doping strategy in chemical vapor deposition, that influences the formation of atomic-scale ''elbow'' junctions and nanotube covalent interconnections. The enabling aspect of boron doping was elucidated from detailed theoretical calculations and validated by elemental analysis, revealing that boron promotes formation of negative curvature "elbow'' junctions that leads to robust interwoven 3D networks, a "sponge-like" monolith. This novel material possesses ultra-light weight, super-hydrophobicity, high porosity, thermal stability, and mechanical flexibility, and is strongly oleophilic. These properties enable it as a reusable sorbent scaffold for efficiently removing oil from contaminated seawater.

Significance
The efficient, inexpensive and facile method developed is capable of producing bulk quantities of 3D carbon materials that have broad implications for practical material applications such as selective sorbent materials, hydrogen storage and flexible conductive scaffolds as porous 3D electrodes. The ultra-lightweight solid material exhibits enabling multifunctional properties including robust elastic mechanical properties with high damping, electrical conductivity, thermal stability, high porosity, super-hydrophobicity, oleophilic behavior and strong ferromagnetism. The environmental oil removal-and-salvage application from seawater was demonstrated where the nanotube "sponge" acts as an efficient scaffold that can be controlled and recollected via a magnetically driven process, and reused multiple times.
Credit - This work was published in Nature Sci. Rep. A portion of this Research at Oak Ridge National Laboratory's Center for Nanophase Materials Sciences was sponsored by the Scientific User Facilities Division, Office of Basic Energy Sciences, U.S. Department of Energy.

What good are supercomputers if you have to spend your whole time getting data in and out? Researchers at Oak Ridge National Laboratory (ORNL) are working to say goodbye to input/output (I/O) problems with their most recent upgrade of the Adaptive Input/Output System (ADIOS).

ADIOS grew out of a 2008 collaboration between the Oak Ridge Leadership Computing Facility (OLCF) and researchers from academia, industry, and national laboratories. Their goal was a system to get information in and out of a supercomputer efficiently.

"The measurement of success for us has always been 'what percentage of time do you spend in I/O?'" said OLCF computational scientist Scott Klasky. ADIOS was inspired by Klasky's work at the Princeton Plasma Physics Laboratory, where he noticed up to 30 percent of researchers' computational time was spent reading and writing analysis files.

The open-source middleware is designed to help researchers maximize their allocations on leadership-class computing resources from wherever they may be. In essence, it creates more time for research by minimizing the time needed to read and write data to files, even if researchers are sending those files from thousands of miles away.

The previous release-version 1.2-improved usability by allowing users to construct new variables in their simulations while they run, simplifying the interface users work with and improving I/O performance. In fact, researchers using the middleware found the writing process substantially expedited, consuming only .06 percent of their computation time on average.

The biggest challenge for the newest release-version 1.3-was to improve reading efficiency, according to Qing Liu, a scientific researcher working in the Remote Data Analysis and Visualization branch at the National Institute for Computational Sciences and the leading developer of ADIOS.

"Our users were very happy that the writing was greatly improved [with version 1.2]. They could get 30 gigabytes per second for the writing performance, but when they tried to read, the performance was much lower," he said. "There was a huge gap between writing and reading performance."

Ray Grout, a researcher at the National Renewable Energy Laboratory, was one of the first people to test the latest update, using the S3D combustion code to study turbulent flows reacting to one another. Grout noted a huge increase in reading performance.

Klasky also added that ADIOS 1.4 will continue grass-root collaboration from computer science researchers to greatly reduce the problem of coping with large data on high-performance machines.

Computational scientists have a new weapon at their disposal. Earlier this year, the Electronic Simulation Monitoring (eSiMon) Dashboard version 1.0 was released to the public, allowing scientists to monitor and analyze their simulations in real-time.

Developed by the Scientific Computing and Imaging Institute at the University of Utah, North Carolina State University, and Oak Ridge National Laboratory (ORNL), this "window" into running simulations shows results almost as they occur, displaying data just a minute or two behind the simulations themselves. Ultimately, the Dashboard allows the scientists to worry about the "science" being simulated, rather than learn the intricacies of high-performance computing such as file systems and directories, an increasingly complex area as leadership systems continue to break the petaflop barrier.

"In my experience, Dashboard has been an essential tool for monitoring and controlling the large-scale simulation data from supercomputers," said Seung-Hoe Ku, an assistant research professor at New York University's Courant Institute of Mathematical Sciences who uses the Dashboard to monitor simulations of hot, ionized gas at the edge of nuclear fusion reactors, an area of great uncertainty in a device that could one day furnish the world with a nearly limitless abundance of clean energy. "The FLASH interface provides easy accessibility with Web browsers, and the design provides a simple and useful user experience. I have saved a lot of time for monitoring the simulation and managing the data using the Dashboard together with the EffIS framework."

According to team member Roselyne Tchoua of the Oak Ridge Leadership Computing Facility (OLCF), the package offers three major benefits for computational scientists: first and foremost, it allows monitoring of the simulation via the Web. It is the only single tool available that provides access and insight into the status of a simulation from any computer on any browser; second, it hides the low-level technical details from the users, allowing the users to ponder variables and analysis instead of computational elements; and finally, it allows collaboration between simulation scientists from different areas and degrees of expertise. In other words, researchers separated geographically can see the same data simultaneously and collaborate on the spot.

Furthermore, via easy clicking and dragging, researchers can generate and retrieve publication-quality images and video. Hiding the complexity of the system creates a lighter and more accessible Web portal and a more inclusive and diverse user base.

The interface offers some basic features such as visualizing simulation-based images, videos and textual information. By simply dragging and dropping variable names from a tree view on the monitoring page onto the main canvas, users can view graphics associated with these variables at a particular time stamp. Furthermore, they can use playback features to observe the variables changing over time.
Researchers can also take electronic notes on the simulation as well as annotate movies. Other features include vector graphics with zoom/pan capabilities, data lineage viewing, and downloading processed and raw data onto local machines. Future versions will include hooks into external software and user-customized analysis and visualization tools.

"We are currently working on integrating the eSiMon application programming interface into an ADIOS method so that ADIOS users automatically get the benefit of monitoring their running simulation," said the OLCF's Scott Klasky, a leading developer of ADIOS, an open-source I/O performance library.

The "live" version of the dashboard is physically located at Oak Ridge National Laboratory (ORNL) and can be accessed with an OLCF account at https://esimmon.ccs.ornl.gov. This version of the dashboard gives an overview of ORNL and National Energy Research Scientific Computing Center computers. Users can quickly determine which systems are up or down, which are busy and where they would like to launch a job. Users can also view the status of their running and past jobs as well as those of their collaborators.

However, a portable version of eSiMon is also available for any interested party, and the platform cuts across scientific boundaries so that the Dashboard can be used for any type of scientific simulation. For information on acquiring and/or using the eSiMon dashboard, visit http://www.olcf.ornl.gov/center-projects/esimmon/.