Posters

During the PASC14 conference the following posters will be presented (gallery of HG F-Floor, see location plan here below).

The poster session will take place on June 2, from 18:30 to 20:00.

PASC Co-Design Projects

GeoPC: Development of hybrid multi-level smoothers using next generation hardware

Complex multi-scale interactions characterize the physics of the Earth. Quantification of these interactions is crucial to the integration of coupled geological systems, which are today mostly treated in isolation. Resolving structures, processes and dynamics on a wide range of interacting spatio-temporal scales is a unifying grand challenge common to all branches of Earth science and one which must be addressed in order to achieve a comprehensive understanding of the Earth (and other planets) as a multi-physics system.

Numerical modelling has become an important technique to develop our understanding of the Earth as a coupled dynamical system linking processes in the core, mantle, lithosphere, topography and atmosphere. Computational geodynamics focuses on modelling the evolution of rocks over million year time spans. The long-term evolution is governed via the equations describing the dynamics of very viscous, creeping flow (i.e. Stokes equations). The rheology of the bulk composition of a given parcel of rock is described by a visco-elasto-plastic material which possesses an effective viscosity with enormous variations in magnitude (O(10^10)) occurring over a range of different length scales, ranging from O(1000) km to O(1) m.

The numerical solution of this saddle point system associated with the Stokes problem contains an elliptic operator with a highly spatially variable, heterogeneous coefficient. The solution of the saddle point problem consumes >80% of the total run-time in typically geodynamic simulations. Thus in developing realistic 3D geodynamic models, of crucial importance is the preconditioner used to solve the variable viscosity Stokes problem. The strength of the Stokes preconditioner fundamentally controls the scientific throughput achievable and represents the largest bottleneck in the development of our understanding of geodynamic processes.

Within the GeoPC project, we will develop HPC infrastructure to support the construction of robust multi-level preconditioners suitable for the variable viscosity Stokes. Specifically the developments will focus on multi-grid smoothers and coarse grid solvers which utilise hybrid MPI-OpenMP-GPU/MIC parallelism. The hybrid smoothers and coarse grid solvers developed will be included directly within the Portable Extensible Toolkit for Scientific computing (PETSc) library. The impacts of our project are far reaching due to the generality of the solver components we will develop. Additional application areas within the geoscience community which are intended to benefit from the tools developed here includes; reactive transport modelling in porous media, reservoir and groundwater flow modelling, geoelectric forward and inverse models, geothermal models and potential field methods.

GeoPC: Development of hybrid multi-level smoothers using next generation hardware
(Abstract)

The study of geodynamic processes such as mantle convection, subduction, continental rifting is fundamentally important in developing our understanding of planetary evolution. Computational geodynamics utilizes forward models which describe the long-term (million year time spans) evolution of rocks to unravel the complex tectonic history of planets.

The long-term evolution of rocks is governed via the equations describing the dynamics of very viscous, creeping flow (i.e. Stokes equations). The rheology of the bulk composition of a given parcel of rock is described by a visco-elasto-plastic material which possesses an effective viscosity with enormous variations in magnitude (O(10^10)) over a range of different length scales, ranging from O(1000) km to O(1) m.

Performing numerical simulations of such processes represents a number of technical challenges. Firstly, most geologic structures on Earth are non-cylindrical which thus mandates that fully three-dimensional simulations must be used in order to be understand their origin. Secondly, the numerical solution of the saddle point system associated with the Stokes problem contains an elliptic operator with a highly spatially variable, heterogeneous coefficient – which makes effective preconditioning of the Stokes system non-trivial.

In this presentation, we (i) overview the challenges associated the modelling complex 3D geodynamic processes and (ii) highlight the current state of the art in massively parallel software for simulating global scale and regional scale geodynamic processes using both staggered grid finite differences and mixed finite element spatial discretisations. We will also outline the target applications which will be used to test the performance of the hybrid smoothers developed within the GeoPC project.

We present the first iteration of the ‘Comprehensive Earth Model’ (CEM), a solver-independent multi-scale model of the global distribution of density and visco-elastic parameters. The model is based on a 3-D tetrahedral spectral-element mesh, which allows for the meshing of complicated geometries; topography, ocean bottom bathymetry, and major discontinuities are included in this first iteration. The accurate treatments of complex slab subduction models, as well as the topography of discontinuities, are planned for future iterations. The CEM currently contains detailed models of Europe, Australia, and Japan, embedded within the global shear velocity model S20RTS. There are immediate plans to integrate a new full waveform model of the South Atlantic, South America, and Africa, and to move towards global multi-scale full waveform tomography.

The multi-scale nature of the CEM is driven by recent developments in homogenization theory. In the context of an elastic Earth and broadband seismic waveforms, homogenization attempts to find an ‘effective medium’ at large scales which accurately predicts the response of long-period waves to fine-scale structure. For example, users will be able to select a specific scale length of interest, extract a homogenized model that is valid for waves larger than this scale length, and use the resulting single-scale model in a forward/inverse routine of their choosing. User updated models may then be added back into the CEM, which will be re-meshed and re-released at periodic intervals. Model quality metrics, constructed from traveltime, normal mode, and full waveform datasets, will be evaluated, and must be satisfied prior to the inclusion of updated models.

We encourage the integration of geomodels obtained from a variety of inversion techniques (i.e. normal mode methods, waveform methods, traveltime methods, gravity methods), and from a variety of research teams. We also focus on the inclusion of multiple data types (i.e. body waves, surface waves, etc.) in an effort to mitigate biases stemming from the inclusion of a single type. The consolidation of many models and data will provide a fertile testing ground for problems in seismology, geodynamics, and inverse theory.

The multiscale sensitivity of seismic waves to Earth’s internal structure motivates this construction of a multiscale Earth model: one which contains detailed, fine-scale structure in regions where reliable short-period data are available, and one which presents the broad, low-wavenumber Earth in regions that don’t enjoy significant short-period sampling. At present, the state of our knowledge of Earth’s internal structure is scattered amongst an array of different models, and it is difficult to present, as a community, a model that best represents our multi-parametric tomographic achievements across the scales. We hope that the CEM makes a stride towards this goal.

High-performance and scientific computing is currently undergoing a paradigm shift towards highly parallel heterogeneous architectures. While these architectures provide significantly better power efficiency and raw performance, software development is becoming increasingly hard. In a best case scenario software optimizations do not translate from one platform to another, but most of the time one even has to adopt a new platform specific programming model (such as CUDA). In our project we focus on the development of compiler technologies, which help to tackle the challenges of writing performance portable code running on different architectures. On our poster we highlight the different approaches existing, such as domain specific libraries, domain specific code generators, and compiler enhancements. Furthermore we introduce our recent work, centered around domain specific code generation for stencil computations.

STELLA: A domain-specific language for stencil methods on structured grids

The dynamical core of many weather and climate models are implemented as stencil methods solving partial differential equations (PDEs) on structured grids. Due to the high memory throughput and low power consumption, graphics processing units (GPUs) are an interesting hardware architecture for stencil codes. Adapting those models to new and emerging architectures like GPUs is challenging. Retaining a single, maintainable source code is a strong requirement from the communities developing weather and climate code, as normally models are deployed on a large variety of systems.

Domain-specific languages (DSLs) are an interesting approach to address this challenge. STELLA is an DSL embedded in C++ and based on template meta-programming, which has been developed for stencil codes on structured grids. The STELLA library abstracts the loop logic and parallelization of the stencil as well as the hardware dependent optimizations of the solver, allowing to keep a single source code that compiles and runs efficiently on multiple architectures. Using the DSL language of STELLA, a stencil can be described in a concise way, close to the mathematical description of the PDEs, increasing its readability to the model developer. Currently the library supports backends for x86 CPUs and NVIDIA GPUs. The dynamical core of the atmospheric model COSMO (composed of a large set of stencils) was ported using the STELLA library. At the current state, the dynamical core using STELLA is 2x faster on CPUs, and 6x faster on GPUs.

One pillar of the PASC GridTools project aims at consolidating and improving the syntax of the STELLA DSL as well as the performance obtained with the different backends. Optimizations of the GPU backend for the Kepler architecture for exploiting additional parallelism on the vertical dimension or improving memory locality are currently being implemented. Equally importantly, experience of porting the COSMO dynamical core to STELLA provided a valuable feedback on the language of the DSEL. It is foreseen to simplify the syntax in order to make the description of the stencils in terms of the DSEL language more concise. Finally, a new backend for analysis purposes is planned.

Many applications share common high-level algorithmic motifs. An uncountable number of libraries and languages have been developed to provide application developers with tools to both express the algorithms naturally, and obtain the best possible runtime performance from the implementation. This approach is almost unavoidable, since the complexity and diversity of today computer architectures makes the effort of developing portable and efficient monolithic scientific applications a challenging endeavor.

Starting from the experience with STELLA (STEncil Loop LAnguage), the second pillar of the GridTools project, we plan to develop a set of tools dedicated to the stencil motifs for regular and block-structured grid applications. The tools will be provided as C++ template libraries in order to allow: 1) understandability and maintainability of codes; 2) portable efficiency of execution; 3) easy interoperability between the tools and with other libraries; 4) basing the development on industry level consolidated development tools.

STELLA has been desiged to fit to the atmospheric model COSMO, and especially its dynamical core, which makes it tied to it. The GridTools libraries will aim, instead, at providing application agnostic constructs to: 1) extend COSMO model to different execution strategies, such as different domain decomposition schemes that could be cumbersome in STELLA; 2) enable other applications, such as ICON climate model and geoscience simulations, to take advantage of the motifs abstracted by the libraries for improved clarity, portability, and efficiency.

GridTools: The next tool for stencil methods on structured grids
(Abstract)

The solution to Mean Field Theory (MFT) problems, occurring e.g. in Kohn-Sham Density Functional Theory (DFT) and Hartree-Fock (HF), lead to cubic scaling algorithms wrt the system size. From the principle of locality it follows, that such operators become sparse for large systems. Algorithms can be found that map the problem of solving such Schrödinger equations to problems of solving equations involving sparse matrices.
For an efficient computation high-performant and efficient sparse matrix-matrix multiplications are essential.

Here we present our latest developments on DBCSR [1], a sparse matrix storage, manipulation and multiplication library. DBCSR is used in the CP2K [2] quantum chemistry program and its multi-layered structure automatically takes care of and optimizes several computational aspects like parallelism (MPI, OMP, Accelerators), data (cache) locality
and on-the-fly filtering.

We describe the challenges and our solutions for designing the DBCSR library with a special focus on the backend layer and the backends (CPU, CUDA, OpenCL).
The latter, as key components for accessing all available computational resources on a modern multicore hybrid system concurrently, will be explained in more detail.
Measured performance for both dense and sparse matrix multiplications will be presented and compared.

The concept of linear scaling density functional theory has been put forward
several decades ago, but only with the advent of sufficiently powerful
supercomputers and highly efficient implementations have applications on
condensed phase systems become possible. The current implementation in CP2K
allows for routine calculations on systems in the range 10'000-100'000 atoms,
with sufficiently accurate basis sets. The capabilities include structural
relaxation and molecular dynamics, which is essential for applications. Here, we
present the results of two recent projects, namely the morphology of
two-dimensional polymers and the the structure and electronic properties of
anatase nanoparticles in solution.

The Minima Hopping global optimization method uses physically realizable molecular dynamics moves in combination with an energy feedback that guarantees the escape from any potential energy funnel. For the purpose of finding reactions paths, we argue that Minima Hopping is particularly suitable as a guide through the potential energy landscape and as a generator for pairs of minima that can be used as input structures for methods capable of finding transition states between two minima. For Lennard-Jones benchmark systems we compared this Minima Hopping guided path search method to a known approach for the exploration of potential energy landscapes that is based on deterministic mode-following. Although we used a stabilized mode-following technique that reliably allows to follow distinct directions when escaping from a local minimum, we observed that Minima Hopping guided path search is far superior in finding lowest-barrier reaction paths. We therefore suggest that Minima Hopping guided path search can be used as a simple and efficient way to identify energetically low-lying chemical reaction pathways. Finally we applied the Minima Hopping guided path search approach to 75-atom and 102-atom Lennard Jones systems. For the 75-atom system we found paths whose highest energies are significantly lower than the highest energy along the previously published lowest-barrier path. Furthermore, many of these paths contain a smaller number of intermediate transition states than the previously publish lowest-barrier path. In case of the 102-atom system Minima Hopping guided path search found a previously unknown and energetically low-lying funnel.

Particle-particle interaction and space-dependent effective dielectric constant in colloidal system

The spatial distribution of particles in colloidal media in contact with a metallic electrode affects the average dielectric properties of this system and can alter response of the electrode at the interface with the metal. We present a simulation method for the numerical estimate of these distributions as a function of the electrode configuration and loading. In our Monte-Carlo-Poisson simulator, the colloidal system is described with single particle resolution. This characteristic allows for taking into account volume forces and particle-particle interactions usually neglected in the continuum effective medium approximation. In turn, a large number of particles and large systems can be simulated to meet the electrochemical device design needs. In an experimentally verifiable case study, we discuss the role of the multi-particle interactions in high and moderate density regimes. A key point is the features of the electric field at the solid/liquid interface as well as its propagation inside the wet device where the particles in the colloidal solution are free to move. The simulation code coupled with well-designed experiments allows us to validate our models, gaining insight into the microscopic processes occurring at a solid/liquid interface as well as in the electrolyte environment.

One of the general frameworks to model the spatio-temporal behavior of a population is Fisher-Kolmogorov (FK) equation. The model is based on two main assumptions: (i) the underlying process of spatial dispersal is diffusion and (ii) the population growth depends on the current density. The FK equation leads to a travelling wave of the population in space and time. However, since in a classical deterministic representation the discrete nature of the population is neglected, stochastic formulations of FK are becoming increasingly important.

Here, we introduce an individual-based formulation of the stochastic FK equation suited for HPC. The model differs from standard (particle based) representations of stochastic FK models in two aspects. First, the considered model is synchronous, i.e., the update of movement and growth (reaction) is simultaneous and independent for all agents. Second, several births and deaths are possible in one time step contrary to the one-step process. Simulation results are compared to previous results on stochastic FK and deterministic FK. Factors such as the propagation speed of the travelling wave and the mean number of individuals in a given spatial region are of
importance.

The proposed model allows to perform HPC simulations on a more structured spatial environment (for example a world map) and thus might be used for comparison of large-scale patterns from an agent-based model of human dispersal and population dynamics.

QHG: an agent-based simulation code for population and ecosystem dynamics

Agent-based modeling is a flexible and powerful computational approach to the simulation of complex systems via a combination of small-scale stochastic and deterministic rules. As such, it presents some peculiar challenges for HPC. In the HPC-ABGEM project within the PASC network,
we are developing QHG, a C++ code and model library for HPC agent-based simulations of population and ecosystem dynamics. QHG will use a hybrid MPI-OpenMP parallelization scheme, and is built around a modular and versatile agent design that allows the extension and development of models for specific applications. Here we present our computational strategy, preliminary scalings with shared-memory multi-threading, and visualizations of test simulations.

QHG: an agent-based simulation code for population and ecosystem dynamics
(Abstract)

In this poster we present the Isotropic Diffusion Source Approximation
(IDSA) of the Boltzmann equation for neutrino radiative transfer in
Core-Collapse Supernovae (CCSNe) and interpret it as an implicit Fuzzy
Domain Decomposition Method (iFDDM). This interpretation gives us an
abstract view on how the method is constructed and sheds light on good
properties and potential problems of the IDSA.

We also introduce a simplified model to test the IDSA and show the
associated partition of unity that defines the Fuzzy Domain
Decomposition (FDD). This simplified model shows a long time
instability and a suboptimal approximation error of the IDSA. Some
improvements are suggested on the basis of numerical simulations.

Genome sequencing has become affordable due to its high-throughput nature and innovation in the sequencing technology. However, there remain challenges, both at the analysis as well as computational level, that need to be addressed in order for this technology to find its way into everyday practice in e.g. healthcare and medicine.
The first challenge is to cope with the flood of sequencing data which surpasses Moore’s law. The second challenge is the ability to process such enormous deluge of data in order to 1) increase the scientific knowledge of genome sequence information and 2) search genome databases for diagnosis and potentially therapeutic purposes.
Significant compression of genomic data is required to reduce the storage size, to increase the transmission speed and to reduce the cost of I/O bandwidth connecting the database and the processing facilities.
In order to process the data in a timely and effective manner, algorithms need to exploit the significant data parallelism and they need to be able to “scale” and “map” the data processing stage on the massive parallel processing infrastructure according to the available resources and to the application constraints.

Currently, no agreed standard for genomic data compression format exists. Such data is stored in text files, showing sizes of up to 300GB/genome. Compression is therefore needed in order to let the infrastructures handle the huge amount of sequencing data produced every day. In this context, the aim of the PoSeNoGap project is to propose a new standard for genomic data compression, as well as the hardware/software tools to deal with it in a very efficient way. The format will have to reduce as much as possible the compressed data size while allowing random access throughout the genome.
This new standard will then allow to develop computational models for compression/decompression, and specifically for genome comparisons/sequences finding. A dataflow programming language (CAL) will be exploited for describing such processes, and for generating software and hardware architectures from a single description. The advantage of a dataflow description is the possibility to analyse the complexity of the compression algorithms and to exploit the intrinsic parallelism in a platform-independent way. The results will then be optimized, notably in terms of I/O infrastructures, by developing custom I/O platforms for reducing as much as possible the processing time of genomic data algorithms.

DIAPHANE: a common platform for application-independent radiative transfer in astrophysical simulations

he inclusion of radiative physics in astrophysical simulations,
from star and planet formation to galaxy formation, from supernovae modeling to black holes, represents their current most severe bottleneck.
Individual codes include only very limited approximations
to the radiative transfer equation and to the Boltzmann equation for neutrino transport, which work only in very
specific regimes, have implementations strictly tailored to individual codes, and are often not the most efficient
choice for the regime of interest.
In order to make significant progress in all these fundamental areas of research we need to be able to adopt different
approximations to radiative and neutrino transfer in the same simulation,
and be able to switch smoothly between them.
The first step in this direction is the establishment of a library of radiation and neutrino transport modules
that can be interfaced to any of our community codes, grid-based as well as SPH. This is the main goal of the DIAPHANE project presented here.
The second goal is to be able to couple robustly and efficiently
the different underlying approximations, laying down the path to
adaptive radiation hydrodynamics.
Following the pathway already taken for neutrino transport in the modeling
of supernovae explosions, we will adopt the Fuzzy Domain Decomposition Method to couple different approximation (see companion
poster).

DIAPHANE: a common platform for application-independent radiative transfer in astrophysical simulations
(Abstract)

Presenting author

Lucio Mayer (University of Zurich)

Co-authors

Alireza Rahmati (Max-Planck Institute for Astrophysics, Garching and University of Zurich), Thomas Peters (University of Zurich)

Poster #

PASC-16

Climate

Extensions of the numerical weather prediction model COSMO for the efficient study of air quality, aerosol-climate feedbacks and greenhouse gas sources and sinks

Numerical meteorological models, originally developed for daily weather prediction, are increasingly being used to address a wide range of environmental problems from climate change to air quality, pollen forecasting, environmental impact assessment, energy, geoengineering, or forecasting of hazardous plumes. These applications usually require extensions of the code with additional modules (“internal coupling”) or coupling with existing models (“external coupling”), but this coupling is often not trivial due to different requirements of the model components such as different integration time steps or spatial representation.
Here, we present recent extensions of the numerical weather prediction model COSMO for the efficient simulation of air quality, chemistry-climate interactions, and greenhouse gas sources and sinks. COSMO is the regional weather forecast model jointly developed by a consortium of European weather centers including the German weather service DWD and MeteoSwiss, and is used in the climate version COSMO-CLM by a wide research community. Recently, a generic tracer transport mechanism has been implemented and incorporated into the new version COSMO 5.0. It not only handles the advective, turbulent and convective transport of the prognostic variables of the model, but also allows flexible definition of new tracers such as trace gases and aerosols.
The extensions presented here are all centered on this generic mechanism:
- COSMO-ART includes the ART (Aerosols and Reactive Trace gases) extension with a detailed description of air pollution chemistry and aerosol processes, and is mainly designed to study air quality and aerosol-meteorology feedbacks on short, episodic to annual time scales.
- COSMO-ART-M7 extends ART with an option to use an alternative aerosol module (“M7”) in combination with a simplified but efficient chemistry and thus allows simulating climatic (decadal) time scales. It is mainly designed to study regional aerosol-climate feedbacks.
- COSMO-GHG defines a set of greenhouse gases to be passively transported through the atmosphere and accounts for emissions and surface-atmosphere exchange fluxes. COSMO-GHG will be applied in combination with inverse modeling approaches to quantify greenhouse gas sources and sinks.
These extended versions are computationally much more demanding than the COSMO core since a large number of additional tracers and processes have to be considered. Thus these models are currently severely limited in terms of applicability and expensive in terms of energy consumption. COSMO has recently been ported to GPUs within the framework of the HP2C initiative to optimize it for computational and energy efficiency. Although these developments will facilitate the application of COSMO for numerical weather prediction and climate simulations, they do little to address the coupling with all model extensions, in particular with ART and M7, for which significant investments are still required to take them to a similar level. The efficiency of ART is being addressed in the EU Exa2Green project.
We briefly introduce the three model systems and their integration into COSMO and present a range of examples of their application. Finally, we present a benchmark analysis of COSMO-ART from the Exa2Green project to highlight the critical components and to outline the next steps for model optimization.

Extensions of the numerical weather prediction model COSMO for the efficient study of air quality, aerosol-climate feedbacks and greenhouse gas sources and sinks
(Abstract)

A novel atmospheric large eddy simulation infrastructure for climate applications

Addressing scientific questions regarding climate generally necessitates the investigation of dynamical systems over long timescales. Advancements in the sophistication and availability of High Performance Computing (HPC) infrastructure have made Large Eddy Simulation (LES) of boundary layer and deep convective clouds over climatologically relevant timescales realizable, despite the large computational expense of such simulations. However, most atmospheric LES codes are not designed to take advantage of recent HPC hardware improvements, making them unsuitable for studying climate-relevant scientific problems despite advances in HPC. Moreover, many commonly used LES codes are based on relatively primitive software design, limiting their extensibility.

In order to overcome the limited performance and extensibility of existing LES codes, we have developed a novel atmospheric LES code, PyCLES, that makes use of modern coding paradigms and numerical algorithms to take full advantage of modern HPC architectures. However, when seeking software performance optimization, it is necessary to keep in mind that code must be usable and modifiable by subject matter rather than computer science experts. This has motivated development using a combination of Python and Cython that affords both the simplicity of Python and performance of C.

PyCLES solves a thermodynamically consistent form of the anelastic equations of motion (Pauluis, J. Atmos. Sci., 2008) on an Arakawa-C grid. In order to take advantage of modern HPC infrastructure which features many computational cores per node, the discrete equations are solved using a hybrid parallelization approach that uses MPI (Message Passing Interface) for distributed memory parallelism and communication between nodes, while within each node Open-MP is used for shared memory parallel computations utilizing multiple threads. The hybrid parallelization reduces MPI communication overhead and reduces total memory usage by eliminating a large fraction of MPI border points relative to non-hybrid parallelization. Moreover, node-level shared memory parallelism allows a degree of load balancing for iterative thermodynamic and microphysics routines through dynamic thread scheduling. Further parallel optimization is achieved by grouping three-dimensional prognostic variables into four-dimensional arrays so that MPI border points can be updated with fewer communications, thereby reducing network latency overhead. The software also allows MPI domain decomposition in one, two, or three dimensions as selected at runtime depending on the particular domain configuration.

In this poster we will present the algorithmic and software design of PyCLES and discuss the results of initial benchmark simulations in order to assess its performance.

Solid Earth Dynamics

Tsunamis are devastating natural disasters potentially able to create a lot of damage. In order to predict such events and their behavior, fast and detailed simulations of tsunamis are required. We evaluate the ability of SWE shallow water numerical model to perform efficient and realistic tsunami simulations. We used SWE model to reproduce 2004 Great Sumatra-Andaman and 2011 Tohoku tsunami.
Nowadays, many of the Top500 supercomputers are using GPU accelerators to enhance the peak performance. The GPU-enabled clusters Tesla-CMC and "Todi" Cray XK-7 were benchmarked using different number of processes and GPUs to obtain the most efficient and the fastest SWE configurations. Using all 3 available GPUs of Tesla-CMC, we observed the maximum speedup of 158x in comparison to single-core CPU version. The most efficient configuration was the one with a single GPU, achieving speedup of 58x. Performance results for “Todi” Cray XK-7 are also presented for larger number of compute nodes with maximum speedup of 260x using 32 GPU accelerated processes.

By alternating full waveform inversion and traveltime tomography we attempt to merge their complementary merits and produce models of the European continent explaining both waveform and traveltime data sets.

Technological developments and advances in theoretical and numerical seismology allow us to assimilate complete waveforms for the solution of full waveform tomographic problems. However, due to computational limitations, full waveform inversion is and will, in the long term, be limited to an intermediate period band. Valuable information contained in high-frequency P and S wave traveltimes cannot be exploited. Consequently, full waveform inversion only yields excellent results in the upper 300 km, where surface wave sensitivity is large. P and S velocity heterogeneities at greater depth are less well resolved. On the contrary, classical traveltime tomography intrinsically incorporates information from high-frequency body waves.

As an initial step to combine both methods and extend the spectrum of exploited information, we test an inversion scheme, where full waveform inversion and ray tomography alternate, thereby avoiding an explicit coupling of both methods and data sets. We put special emphasis on the convergence properties of this scheme and its ability to produce 3D models that explain both waveform and traveltime data sets. We test our approach starting with a full waveform tomographic model of Europe and Western Asia in conjunction with the traveltime tomography package FMTOMO and the teleseismic data set used for the construction of the global model S40RTS.

The development of adjoint gradient-based optimization techniques for general compositional flow problems is much more challenging than for oil–water problems due to the increased complexity of the code and the underlying physics. An additional challenge is the treatment of non smooth constraints, an example of which is a maximum gas rate specification in injection or production wells, when the control variables are well bottom-hole pressures. Constraint handling through lumping is a popular and efficient approach. It introduces a smooth function that approximates the maximum of the specified constraints over the entire model or on a well-by-well basis. However, it inevitably restricts the possible solution paths the optimizer may follow preventing it to converge to feasible solutions exhibiting higher optimal values. A simpler way to force feasibility, when the constraints are upper and lower bounds on output quantities, is to satisfy these constraints in the forward model. This heuristic treatment has been demonstrated to be more efficient than lumping and at the same time it obtained better feasible optimal solutions for several models of increased complexity. In this work a new formal constraint handling approach is presented. Necessary modifications of the nonlinear solver used at every timestep during the forward simulation are also discussed. All these constrained handling approaches are applied in a gradient-based optimization framework for exploring optimal CO2 injection strategies that enhance oil recovery for a realistic offshore field, the Norne field. The new approach recovers 3.7% more oil than the best of its competitors.

Exact boundary conditions (EBCs) (van Manen et al. 2007, Vasmel et al. 2013) were originally proposed to model wave propagation on a spatially limited region (subdomain) of a larger 2D/3D medium (full domain). EBCs allow one to restrict the simulation to the subdomain, while maintaining all the long-rage interactions with heterogeneities in the full domain. The methodology dynamically links the separated domains through the use of EBCs at the interface between them. In a typical geometry for the application of EBCs, a subdomain is bounded by an injection surface and heterogeneous scatterers are present inside and outside the subdomain.

In imaging and inversion applications (Fichtner, 2010), modeling on a large three-dimensional domain can be computationally very expensive. In certain scenarios, only part of the subsurface model changes significantly from one iteration to the next. This feature can be exploited when the wavefield is only recomputed in those regions where the model has been updated. In these cases, EBCs become attractive because they allow recomputations to be done on the truncated domain only (thereby significantly reducing the size of the simulations) while completely accounting for interactions with the unperturbed domain outside. This approach changes the nature of the computations that have to be carried out. The simulations are carried out on a smaller subdomain, however, the implementation of EBCs requires additional computations: (1) a set of Green’s functions modeled in the full domain (these are computed only once) and (2) an extrapolation process at every time step of the simulation.

The extrapolation process takes care of the interaction of the wavefield propagating out of the subdomain with the inhomogenities in the full domain. It uses the wavefield recorded on a recording surface, which is extrapolated to the injection surface. This step is implemented through a discretized integral of Green’s second identity. The required Green’s functions are calculated on the full domain beforehand. At each time step, we compute and store the values of the wavefields needed in future time steps during the simulation. For this reason and because the source and receivers must be well sampled along the two surfaces, the extrapolation process must be carefully implemented.

The implementation of EBCs relies on a large look up table of precomputed Green’s functions to be available at all time steps. The total number of entries in this table is: (# receivers) x (# injection locations) x (# simulation time steps). This requires an efficient implementation using MPI I/O.

Additionally, EBCs could be used as absorbing boundary conditions at the edge of any modeling domain. If the Green’s functions needed by the extrapolation are those of a medium that is continuing outwards, all outgoing events will be canceled on the injection surface and no energy will be excited at later times, resulting in full absorption of all events at the edge of the domain. The extrapolation from recording to injection surfaces is the major factor in the overall computational costs of such a scheme. In order for such a method to be competitive with conventional methods it is essential that the surface integral can be calculated efficiently and with Green’s functions that are relatively cheap to compute.

EBCs can be applied in different scenarios where one wants to recompute the response of a subdomain embedded in a larger model. It allows for the computational domain to be spatially truncated and changes the nature of the computations to be carried out, requiring the calculation of a surface integral at every time step. We foresee applications for imaging and inversion scenarios, and as an alternative type of absorbing boundary conditions.

We present a seismic tomography model for the Japan archipelago obtained using full waveform inversion and adjoint methods. A high-resolution seismic velocity model is essential for Japan as means to comprehend and characterize the complexity of the tectonic setting, and to further our understanding of earthquake sources and rupture propagation.

The study area covers the Japanese islands – an area between 20°-50°N and 130°-160°E – and extends to a maximum depth of 500 km. In virtue of complicated tectonics and resulting high seismicity, dense seismic networks are present in Japan and surrounding countries. We make use of broadband data from three networks – F-net in Japan, BATS in Taiwan, and notably, the National Earthquake Network in South Korea. Due to access difficulties, data from this network had not been used in the preceding tomographic study of the same area. We use >50 carefully selected earthquakes, located within the model area and occurring between 1999 and the present. Magnitudes of the events are restricted to 5≤Mw≤6.9 for a point source approximation to be valid. A spectral-element method is used for forward waveform calculation, which comes with the geometric flexibility of finite-elements method and the accuracy of spectral methods. To quantify differences between the observed and synthetic waveforms, we use time-frequency misfits, which exploit the evolution of the frequency content of the data in time. The sensitivities (Fréchet kernels) are then calculated using adjoint methods. The employed methodology allows us to explain the data of dominant period as low as 10 s. To prevent possible over-fitting of the data, we ensure that final misfits are not lower than those obtained if additional (not yet used) data are incorporated.

The final results of this study will contribute to the ‘Comprehensive Earth Model’ being developed by the Computational Seismology group at ETH, with the aim to represent the snapshot of the current knowledge of the Earth’s internal visco-elastic structure.

The physical properties of the magma in shallow reservoirs are deeply connected to the mechanical and chemical interaction between its different constituents, crystals, volatiles and silicate melt. Volatiles, in particular, play a fundamental role on the dynamics of volcanic eruptions. (i.e. the presence of an abundant exsolved volatile phase in a shallow magma chamber can dramatically affect its power of eruptibility).
Our limited understanding of exsolved magmatic volatile phase (MVP) transport in magma reservoirs with variable crystallinities does not allow us to constrain important concepts such as (1) how fast volatiles move in reservoirs, and (2) when and where they can accumulate. Consequently, we need more sophisticated physical models to shed light on the non-linear interactions between the exsolved MVP, silicate melts, and the crystal matrix. This study reports the use of small-scalle (crystal/bubble size) multiphase numerical modelling based on the lattice Boltzmann technique to calculate the mass transport of the MVP in a host medium that can display sudden transitions in crystallinity (regions of transition between high crystal content magmas and melt-rich layers that are thought to be representative of a crystal-poor silicic cap overlying a mush zone).

In our numerical experiments we find that capillary channels of MVP phase can form in the high-crystallinity medium and break as they penetrate into the crystal-poor layer (capillary pinch-off). Moreover, the viscosity contrast between the MVP and the melt causes a sudden increase in viscous dissipation as the transport regime of the MVP shifts from capillary channels to disconnected bubbles/slugs in the melt-rich layer. As a result, the MVP tends to accumulate in the melt-rich layer. The accumulation of MVP in the crystal-poor layer can affect the physical properties of the latter, and greatly impact the eruptibility of the system.

In multi-scale complex media, finite element meshes often require areas of local refinement, creating small elements that can dramatically reduce the global time-step for wave-propagation problems due to the CFL condition. Local time stepping (LTS) algorithms allow an explicit time-stepping scheme to adapt the time-step to the element size, allowing near-optimal time-steps everywhere in the mesh. We develop and implement an efficient multi-level LTS-Newmark scheme for general use with spectral element methods (SEM) with applications in seismic (viscoelastic) wave propagation. A well known SEM implemenation, SPECFEM3D is a widely used community code which simulates seismic and acoustic wave propagation in earth-science applications. Our new LTS scheme extends the 2nd-order accurate Newmark time stepping scheme, and leads to an efficient implementation in SPECFEM3D producing real-world speedup of multiresolution seismic applications. Extending previous LTS work, we generalize the method to utilize many refinement levels with implementation details necessary for a high-performance code that can utilize many CPUs and GPUs via MPI.

Using 3D numerical simulations of seismic wave propagation in heterogeneous media, we systematically
compare the imprints of heterogeneities of different type (and particularly density heterogeneities) on synthetic
seismograms.
Lateral density variations are the source of mass transport in the Earth at all scales, acting as drivers of
convective motion in the mantle. However, the density structure of the Earth remains largely unknown since
classic seismic observables and gravity provide only weak constraints with strong trade-offs. Current density
models are therefore often based on velocity scaling, making strong assumptions on the origin of structural
heterogeneities, which may not necessarily be true.
We propose to develop a seismic tomography technique that directly inverts for density, using complete
seismograms rather than arrival times of certain waves only. The first task in this challenge is to systematically
study the imprints of density on synthetic seismograms.
In this context, our study aims to compare the significance of density heterogeneities relative to velocity
heterogeneities, and to design a numerical experiment with a source-receiver configuration particularly sensitive
to density.
To compute the full seismic wavefield in a 3D hetrogeneous medium without making significant approximations,
we use numerical wave propagation based on a spectral-element discretization of the seismic wave
equation. We consider a 2000 by 1000 km wide and 500 km deep spherical section, with the 1D Earth model
ak135 as a background. Onto this we superimpose 3D Gaussian-shaped perturbations of different type (P, SV,
SH velocities and density) for depths in the range from 10 km to 70 km. The choice of depth in which the 3D
heterogeneities were placed (10 km – 70 km) was dictated by the surface wave sensitivity to density.
For each depth we perform 4 wave propagation simulations corresponding to 4 different types of heterogeneities,
and calculate surface wave sensitivity kernels. We compare the synthetic seismograms for different
types of heterogeneities with seismograms for the 1D reference model, using various misfit criteria, including
weighted envelope and phase differences based on continuous wavelet transforms.
Our preliminary analyses indicate that density variations do leave a noticeable mark on seismograms, which is of
the same order of magnitude as the one from velocity variations. This suggests that the solution of the seismic
inverse problem for density may become feasible.

The impact of density heterogeneities on seismic wave propagation
(Abstract)

Presenting author

Agnieszka Plonka (Utrecht University, The Netherlands)

Co-authors

Andreas Fichtner (ETH Zurich)

Poster #

EAR-08

What's that noise? Building a cross-correlation data base to study the causes of Earth's background seismic waves

The ambient seismic wave field is composed of many different signals caused, for example, by the interaction of ocean waves with the seafloor. In seismology, correlations of this seismic ‘noise’ are now widely used to derive mechanical properties of the subsurface on various spatial scales. Techniques largely rest on the finding that a stack of cross-correlations between two receivers converges approximately to the impulse response of the medium between those receivers. Travel times between receivers can thus be extracted from correlations to image the subsurface.
We propose to take a different, waveform-based approach to ambient seismic noise. We expect that exploiting noise correlation waveforms, instead of travel times of their peak arrival only, will yield new insights into the natural causes of ambient noise, and into Earth structure. Instead of making assumptions about the sources or including specific preprocessing steps to eliminate their signature, we therefore propose to numerically model the cross-correlations, compare the modeled correlation waveforms to observed ones, and optimize the model in terms of source locations.
In a first step, we are building up a database of cross-correlations of ambient noise recorded at globally distributed seismic stations. This requires a certain computational effort, since the correlation between every possible station pair has to be considered, and since many realisations of the correlation have to be stacked to obtain a stable result due to the stochastic nature of the noise. We are in the process of testing a python-based toolbox that allows parallel processing and correlation of seismic data. The resulting correlations are to be made publicly available at the Orfeus seismic data center.

What's that noise? Building a cross-correlation data base to study the causes of Earth's background seismic waves
(Abstract)

Precession is a change in the orientation of the rotation axis of a rotating body. The earth goes through one full precessional period over a duration of approximately 26000 years due to the luni-solar tidal torque acting on the Earth's equatorial bulge. From an energetic point of view, precession is a possible driving mechanism for the generation of Earth’s magnetic field, the so-called geodynamo. Recent numerical simulations have shown that precession driven flows can power dynamos in spherical and spheroidal containers. But it is still an open question whether precession is responsible for the geodynamo because numerical models are far away from the Earth's parameter regime. The present study aims at shedding further lights on the role of precession in the geomagnetic field generation. The fully nonlinear Navier-Stokes equation and magnetic field diffusion equation are solved using a highly parallelized spectral code. Firstly, we investigate hydrodynamical instability of precession driven flow in a sphere in which only the Navier-Stokes equation is considered . As the precession rate is above a threshold, the flow becomes unstable through a parametric resonance of two inertial waves with the distortion of streamlines. The dynamo simulations show that both hydrodynamical stable flow (laminar) and unstable flow (disorganized) are capable to sustain dynamos. The laminar flow operates a dynamo with equatorial dipolar magnetic field, whereas the disorganized flow generates a magnetic field dominated by small scale structures. The effect of a thin conducting outer layer is also considered.
Acknowledgements: The simulations are run on Swiss National Supercomputing Center (CSCS) supported by the project s369 and p502. We acknowledge finding from ERC grant 247303(MFECE).

This presentation aims to demonstrate the benefits of using modern numerical methods for understanding the physics of earthquakes and physic-based ground motion modeling.
To this end, SeisSol, a software package based on the arbitrary high-order derivative Discontinuous Galerkin (ADER-DG) scheme, is used to solve the spontaneous earthquake rupture problem simultaneously to high-order accurate seismic wave propagation in space and time. A distinct advantage of the method is the utilization of three-dimensional unstructured tetrahedral meshes qualifying it for modeling complex geometries as occurring in natural fault zones.

We recently verified the method in various advanced test cases of the ‘SCEC/USGS Dynamic Earthquake Rupture Code Verification Exercise’ benchmark suite, including branching and dipping fault systems, heterogeneous background stresses, bi-material faults and rate-and-state friction constitutive formulations.
Now, we study the dynamic rupture process using 3D meshes of fault systems constructed from geological and geophysical constraints, such as high-resolution topography, 3D velocity models and fault geometries. I will present large scale earthquake dynamic rupture scenarios based on the 1994 Northridge blind thrust event and the 1992 Landers strike-slip event in Southern California. Modeling these well documented events, we intend to understand earthquake dynamics and the ground-motion, including the relevant high frequency content, generated on complex fault systems as well as their variation arising from various physical constraints.

Such scenario simulations pose high demands for solver and software package in a HPC environment. In turn, a physics-based understanding of the observed strong ground motions is enabled. For example, our results imply that the Northridge fault geometry favors a pulse-like rupture behavior, as well as that the observed simultaneous failure of several discrete large Landers fault segments depends heavily on the local background stress state.

Recent megathrust earthquakes, e.g., the 2011 M9.0 Tohoku and the 2004 M9.2 Sumatra events, illustrated both their disastrous human and economic impact and our limited physical understanding of their spatial occurrence. To improve long-term seismic hazard assessment by overcoming the restricted direct observations in time and space, we developed a new numerical seismo-thermo-mechanical (STM) model. This 2D continuum-mechanics based, viscoelastoplastic model uses an Eulerian-Lagrangian finite difference framework with similar on- and off-fault physics. This typically long-term geodynamic approach is now extended to study short-term seismogenesis through a local invariant implementation of strongly slip rate-dependent friction. This approach may help to shed light onto the interaction between long-term subduction dynamics and deformation and associated short-term seismicity. Additional advantages of this STM approach include the physically consistent emergence of rupture paths, both on- and off-megathrust, and the inclusion of three key ingredients for seismic cycling --rate-dependent friction, slow tectonic loading, and visco-elastic relaxation--.

The validation of this approach is accomplished through a comparison with a laboratory seismic cycle model (van Dinther et al., JGR, 2013a). A more realistic geometry and physical setup of the Southern Chilean margin showed that results also agree with a range of seismological, geodetic, and geological observations, albeit at lower coseismic speeds (van Dinther et al., JGR, 2013b). This setup also suggests that a) ~5% of cyclic deformation is being stored on the long-term, b) a self-consistent downdip transition zone between 350°C and 450°C arises from temperature-dependent viscosity, and c) megathrusts are weak (i.e., pore fluid pressures of ∼75% to 99% of that of solid pressures).

Finally, we exploit the main advantage of this innovative approach; the spontaneous unstable rupturing of off-megathrust events (van Dinther et al., GRL, 2014). The characteristics of simulated normal events within the outer-rise and splay and normal antithetic events within the wedge resemble seismic and seismological observations in terms of location, geometry, and timing. Their occurrence agrees reasonably well with both long-term analytical predictions based on dynamic Coulomb wedge theory and short-term quasi-static stress changes resulting from the typically triggering megathrust event. The impact of off-megathrust faulting on the megathrust cycle is distinct, as more shallower and slower megathrust events arise due to occasional off-megathrust triggering and increased updip locking. This also enhances tsunami hazards, which are amplified due to the steeply dipping fault planes of especially outer-rise events.

The innovative character of this seismo-thermo-mechanical approach opens a world of interdisciplinary research between geodynamics and seismology. This can relate to the generation and characteristics of megathrust earthquakes and beyond.

Life Sciences

β2–α2 loop landscape in prion protein

Transmissible spongiform encephalopathies (TSEs) are lethal neurodegenerative diseases that affect humans and a large variety of animals. According to the “protein only hypothesis”, the infectious agent responsible for TSEs is an abnormally folded and aggregated protein that propagates itself by imposing its conformation on the cellular prion protein of the host.
The conversion is the result of a post-translational process whereby most α-helical motifs are replaced by the β-sheet secondary structures. However no atomistic-resolution structural information is available about the β-sheet form.
In the prion protein (PrP) conserved scaffold, a surface area including the loop that connects the strand β2 with the helix α2 has attracted special interest due to high variability in its sequence and three-dimensional structure among mammalian PrPs. Several studies have indicated that a surface epitope that includes the β2–α2 loop plays an important role in prion pathology.
NMR experiments on the mouse PrP WT, suggest that this loop is likely to be in dynamic equilibrium and to populate multiple conformations. Mutations in the loop can significantly affect this equilibrium, indeed for the WT the major state is the 310-helical whereas the I β-turn state seems to be the only conformer populated in the mutant Y169A. It is believed from the chemical-shift analysis that the I β-turn state, corresponds to the low populated state of the WT.
We characterize with atomistic details the conformational landscape of the mouse PrP β2-α2 loop for the WT and the mutant Y169A using an advanced computational scheme combining the Well-Tempered Ensemble with Parallel Tempering (WTE+PT) simulations. WTE, that is an extension of metadynamics, uses a bias on the total potential energy of the solvated protein and enhances the conformational fluctuations of the system. These enhanced fluctuations facilitate higher exchange probabilities with fewer replicas in the PT simulation. In this way, we extend the NMR results providing a free energy landscape and an ensemble of conformers, including the low populated states that are not characterizable by the NMR experiment.
We confirm the hypothesis of the experimentalists that the low populated state of the mPrP WT corresponds to the most populated state of the mPrP Y169A. Moreover, we notice that the mutant explores a wider configurational space and this result may shed new light on the interpretation of the experimetal data. Indeed, mouse PrP Y169A could be involved in a fast exchange among different conformers instead of populating just one state.

Many problems in science and engineering are driven by time-periodic forces. In fluid dynamics, this occurs for example in turbines, or in human blood flow. If the fluid is excited long enough, it establishes a time-periodic steady-state, or at least can be approximated by one. For computing this steady-state, classically one starts with some initial guess and lets it evolve through time-stepping until the periodic steady-state is reached. We use another approach by solving directly for the steady-state solution, introducing the periodicity as boundary conditions in time. We use a multiharmonic ansatz for the time discretization. This allows to exploit further parallelism, not only in space but also in time. We compare the parallelization capabilities of the classical approach with the time-periodic approach. Furthermore, we compare the performance of both approaches with regard to the accuracy and the time-to-solution.

Glioma is the most common type of primary brain tumors. It is characterised by infiltrative grow into the surrounding tissue, instead of forming a solid tumor with a well defined boundary. Due to this, only a part of the glioma beyond a certain concentration threshold is visible in MR images. In contrast, brain tissue infiltrated by tumor cells at lower concentrations appears normal on images obtained with current imaging modalities.

We present a novel personalized approach for inferring glioma evolution and spatial distribution. The presented framework includes simulation of glioma growth and proliferation in patient individual brain anatomy obtained from medical images. Model parameters for each patient are inferred from multiple MRI scans of the patient's tumor using a Bayesian uncertainty quantification and parameter estimation framework.

Decades after the discovery of the allosteric phenomenas, the classical models built on static images
of proteins are being reconsidered with the knowledge that dynamics plays an important role in
their function[1]. Here we present an investigation on the origins of the allosteric dynamical changes
that are caused by the interaction between MLL protein and KIX domains.
In previous NMR studies[2], the dynamic ensemble of the accessible states of this binary complex
(MLL:KIX) were studied using relaxation dispersion techniques. The dispersion profiles indicated
the presence of a conformational transition (on the milliseconds timescale) between two
configurations: a so called “ground” state which is highly populated, and an “excited” state which is
scarcely populated. However, it was not possible to resolve the structure of this latter and,
moreover, this study did not provided any description of the allosteric mechanism.
Molecular dynamics (MD) is a natural choice for understanding the molecular origins of dynamics
allostery in this binary complex. This simulation technique allows also an accurately
characterization of the structural behaviors of the two conformational states. However, this time
demanding conformational transition is inaccessible for the typical MD simulation timescale. In
order to avoid this limitation, we have employed enhanced MD methodologies such as
Metadynamics. Indeed, to achieve the sampling of the conformations ensemble of the MLL:KIX
complex, Well-Tempered Ensemble[3] combined with Parallel Tempering (WTE-PT) simulations
were performed. Using this advanced technique we have properly characterized the atomistic
configuration of the excited state for the KIX domain in the binary complex[4]. Other important
outcomes were obtained from the understanding of the structural nature of the communication
pathway and the energetics associated with them[4]. Indeed we shed a light on the signal transmission
along the allosteric pathway and how the residues pass the information from one site to the other.
From these results we had also provide a new simulation protocol useful to give a better description
of many important biological process, such as allosteric phenomena.

Peripheral membrane proteins temporarily bind to the cell membrane, which can induce conformational and functional changes. For instance, the focal adhesion kinases (FAK) is known to bind to phosphatidyl-4,5-inositol (PIP2), a negatively charged phospholipid, through a basic patch in its N-terminal domain. This interaction is thought to release the auto-inhibited conformation of FAK, which leads to an activation of the protein. Here, we probe how FAK binds different model membranes using both atomistic and coarse-grained simulations. With the coarse-grained simulations, we observed membrane binding of FAK and identified several interaction sites with PIP2 including the basic patch. However, when the simulations were continued using an atomistic model, the membrane associated pose of FAK changed slightly. The protein-membrane interactions may be overestimated in the coarse-grained compared to the atomistic simulations. This multiscale approach has the potential to provide valuable insights into the regulation of peripheral membrane proteins, but also shows some limitations of current coarse-grained simulations.

Material Science

Ab-initiosimulation of two-dimensional networks on the surface of water

Molecules adsorbed on surfaces play an important role in catalysis, surface science, and nanotechnology. Traditionally, research has focused on various adsorbates atop metals and metal oxides using computational and surface-science techniques. More recently, however, it was demonstrated that ordered monolayer networks can also be formed on the surface of liquid water by using metal ions to bind together multidentate precursor molecules [1, 2]. As these two-dimensional polymers are challenging to analyze, computational methods can provide valuable insight into their formation and structure.

In this contribution we present large-scale ab initio molecular dynamics (MD) simulations of the formation of a network of tris-terpyridine-derived molecules (TTPB) on a water surface. We use the Piz Daint supercomputer at CSCS and the cp2k code to study the dynamics of the molecule on the surface, the mechanism of Zn ion insertion from the solution and the subsequent linking of molecules into aggregates. We employ advanced MD methods to quantify the free energy surface of the involved processes. Our results provide detailed insight into on-surface and subsurface diffusion in this system and chemical reactions of TTPB on the surface of water.

We study the Hubbard model on different lattices - coupled 1D chains, coupled 2D layers made of square lattice, layered honeycomb lattice - and investigate the thermodynamic properties by the dynamical cluster approximation.

We find that the short-range spin correlations are significantly enhanced for the anisotropic models in the direction with stronger tunneling amplitudes when compared to the isotropic 3D cubic system. Our results provide a thermometer for the quantum simulation experiment of ultracold fermions in an optical lattice and allow an quantitative estimate of the excess entropy during the lattice loading.

We furthermore investigate the dependence of the critical temperature (entropy) at the Neel transition on anisotropy and lattice geometry.

Vibrational spectra provide extremely valuable information in the
investigation of molecules and solids. In particular, Infrared (IR) and
Raman spectroscopy are nowadays standard techniques in analytical
chemistry and routinely applied for the exploration of molecular
structures and reaction mechanisms. There exist certain rules of thumb
for the assignment of bands in the spectra but, still, the interpretation
of experimental spectra is often not straightforward.
Calculations can be of great help allowing the targeted study of specific
structures. In addition, the influence of the environment can be
investigated. Solvent effects, for instance, can be included in the
calculations via explicit solvent molecules or computationally cheaper
continuum solvation models. In order to get a more detailed
interpretation of the vibrational spectra, it is desirable to determine
the impact of certain molecules/atoms on the bands in the calculated
spectra. This can be achieved by decomposition of the measured
properties. In this way, it is possible to quantify the contributions of,
e.g., solute and solvent molecules and adsorbates on solids.
Calculations of vibrational spectra usually rely on the double-harmonic
approximation [1], which is based on an entirely static picture. A more
sophisticated approach is the employment of ab initio molecular dynamics
that considers conformational dynamics and solvent effects at finite
temperatures. Vibrational spectra are then calculated via time
correlation functions of certain properties [2,3]. In case of IR and
Raman spectroscopy, these properties are the electric dipole and the
electric-dipole­electric-dipole polarizability, respectively. Computation
of molecular contributions thus requires the evaluation of local electric
dipoles and local electric-dipole‹electric-dipole polarizabilities,
respectively.
We present a comparison of calculated and experimental spectra and
discuss the impact of different approximations made in the calculation
and the evaluation of local properties [4].
References:
[1] E. B. Wilson Jr., J. C. Decius, P. C. Cross, Molecular Vibrations,
McGraw-Hill, New York (1955).
[2] R. G. Gordon, J. Chem. Phys., 42 3658 (1955).
[3] D. A. McQuarrie, Statistical Mechanics, University Science Books,
Sausalito, CA (2000).
[4] S. Luber, J. Phys. Chem. A, 117 2760 (2013).

The Aurivillius phases form a family of naturally layered-perovskites materials with good ferroelectric properties [1]. Bi5MnTi3O15 (BFTO) is perhaps the simplest known member of this family that also incorporates magnetic degrees of freedom. However, due to the low concentration of magnetic cations in similar systems, it is unclear how long-range multiferroic behaviour can be achieved. For example, room temperature ferromagnetism has been reported for Bi5Co0.5Fe0.5Ti3O15 [2], in contrast with no magnetic order found in Bi5CrTi3O15 [3]. To address this question, we establish the ferroelectric and magnetic properties of BFTO, using ab initio electronic structure calculations, comparing two commonly used exchange-correlation functionals: PBE and PBEsol. We then discuss a potential site preference for Fe3+ and its impact on the polarisation and magnetic couplings. In addition, a brief comparison with Bi5MnTi3O15 will be made.

The high-throughput infrastructure AiiDA and the study of local polarization in perovskites

"Materials by design" is a new and extremely powerful approach in Materials Science, where rather than choosing one material and calculating its properties, one identifies instead a desired property and looks for the best material that optimizes it. This approach requires though to build large databases of computed properties of materials.
A key challenge becomes therefore the need of a "materials' informatics" infrastructure to automatically prepare, execute and monitor workflows of calculations for large classes of materials, and then retrieve and store the results in a format that is easily browseable and queryable. To this aim, we are developing an open-source platform for high–throughput (AiiDA: "Automated Interactive Infrastructure and Database for Atomistic calculations"), that uses an advanced storing scheme to allow for highly flexible queries, combined with a REST API that exposes in a standard format the data stored in the database for further programmatic access.
After describing the infrastructure, we will show some examples of application, focusing in particular on the study of the local polarization in perovskites. Many of these systems display a high-temperature paraelectric cubic phase (with zero net polarization), whose microscopic nature is still debated. Indeed, by performing a systematic study of a selected class of these systems, we are able to identify different behaviors, and in some materials like BaTiO3 and KNbO3 we find the emergence of local ferroelectric dipoles even in the paraelectric phase.

The high-throughput infrastructure AiiDA and the study of local polarization in perovskites
(Abstract)

First principles study of the electrocaloric effect in strained BaTiO3

The electrocaloric (EC) effect - a reversible change in temperature of a material by applying an external electric field - has been known for a very long time [1]. Recently however, the discovery of a "giant electrocaloric effect" [2] has stimulated extensive work on the EC effect, due to its huge potential to increase the efficiency of cooling devices. We have studied how misfit strain affects the EC temperature change in bulk BaTiO3.

We have performed molecular dynamics simulations for an effective Hamiltonian based on first-principles density functional theory [3]. The calculated EC temperature change Delta _T reduces when BaTiO3 is only clamped but not strained, but increases again with increasing misfit strain. Further with increasing misfit strain, there is a shift in the Delta_T peak towards higher temperatures. Therefore the misfit strain can be utilized in two ways - (i) to enhance the EC temperature change and (ii) to achieve a maximal effect in the temperature range of interest for a given application.

Further, we have compared the results from direct simulations of Delta_T with its indirect estimation using a Maxwell relation. This allows us to examine how the order of the phase transition and the rate of change of the applied field affects the EC temperature change.

First principles study of the electrocaloric effect in strained BaTiO3
(Abstract)

Presenting author

Madhura Marathe (ETH Zurich)

Co-authors

Claude Ederer (ETH Zurich)

Poster #

MAT-06

First-principles Fermi surface characterization of hole doped PbTe

Doped PbTe has raised increased interest because of its peculiar properties. In particular, it shows enhanced thermoelectricity, topological insulator behaviour and a charge Kondo effect, depending on the dopant atom. Here we investigate the nature of the Fermi surface in hole-doped PbTe using first-principles calculations. We begin by comparing recent experimental characterizations of the Fermi surface by means of effective masses, band offsets and de Haas-van Alphen frequencies with results from density functional theory (DFT). We find that the values of these properties depend strongly on the choice of exchange-correlation functional and identify functionals that give good agreement with experiment. Our results indicate appropriate methodologies for first-principles studies of doped-PbTe, and give insights into the origin of the charge Kondo effect.

First-principles simulation of electron transport in realistically large nanoelectronic devices

In light of technological challenges involved in manufacturing nanoscale electronic devices, the development of fast, accurate, and reliable computer-aided design tools with atomistic simulation capabilities is becoming a necessity to accelerate the design of new prototypes and reduce the development cost. Density functional theory (DFT) -based quantum transport approaches can rigorously model electron transport phenomena in nanometer-sized devices while taking into account the material properties of the simulated structure from first-principles. In this context, we aim at developing an efficient massively parallel simulator based on DFT and Non-equilibrium Green's Function (NEGF) methods that can simulate realistically large nanostructures with active regions composed of tens of thousands of atoms. Our approach is coupling the DFT simulation package, CP2K, and the quantum transport simulator, OMEN, and leveraging their respective strengths such accurate and efficient algorithms, high scalability, and wide range of applications.

With the available computational power growing according to Moore's law, ever larger systems can be investigated with increasingly advanced methods and new algorithms. For electronic structure calculations on systems containing a few thousand atoms, linear scaling algorithms are essential. For ground state DFT calculations, linear scaling has already been demonstrated for millions of atoms in the condensed phase [J. VandeVondele, U. Borštnik, J. Hutter, 2012]. Here, we extend this work to electronically excited states, for example, to make UV/VIS spectroscopy or investigations of the electron injection process in dye-sensitized solar cells possible. We base our approach on non-adiabatic molecular dynamics, in particular on Ehrenfest molecular dynamics (EMD). The formalism, based on the density matrix, allows for linear scaling based on the sparsity of the density matrix and naturally incorporates density embedding methods such as the Kim-Gordon approach. First benchmark results will be presented.

A local density fitting technique is introduced for Kohn-Sham (KS) density functional theory calculations using a mixed Gaussian and plane waves (GPW) approach. The computationally most expensive step in construction of the KS matrix is the evaluation of the Coulomb matrix. The latter requires the calculation of two-electron integrals with the characteristic O(N^4) problem. Baerends et al. [1] introduced a local resolution of identity approach (LRI), where the atomic pair densities are approximated by an expansion in one-center fit functions reducing the scaling order to O(N^3).
The LRI approach is used in the Amsterdam density functional code and proved to be accurate and efficient [2].

In this work, the LRI technique was adapted for usage in a GPW framework (LRIGPW) and implemented in the CP2K program [3] package. The fitted density is employed for evaluation of Coulomb as well as exchange-correlation potential. GPW scales already linearly with respect to system size since the plane wave expansion of the density is exploited to solve the Poisson equation in Fourier space. This leads to an O(N) process for the evaluation of the Coulomb matrix. Thus, no improvements in terms of scalability can be expected for LRIGPW. However, the prefactor for building the KS matrix is reduced resulting in a system-dependent speed-up of the calculation. Furthermore, the scalability of the grid-based calculation and integration of the potential with respect to number of CPUs can be simplified and improved.

We use ab-initio calculations to investigate the magnetic properties of multiferroic TbMnO_3.
At low temperatures TbMnO_3 demonstrates an incommensurate spiral ordering of Mn spins which is accompanied by appearance of spontaneous electric polarization driven by applied magnetic field. The establishment of such spin ordering is usually described within the framework of a Heisenberg model with competing nearest-neighbor and next-nearest-neighbor exchange interactions. However, our theoretical estimations of these interactions by ab-initio calculations demonstrate a clear deviation from Heisenberg model.
We consider first the coupling between magnetic and orbital orderings as a main source of non-Heisenberg behavior in TbMnO_3, but conclude that it does not explain the observed deviation. We find that higher order exchange couplings should be taken into account for proper treatment of the magnetism in TbMnO_3.

The formalism of the macroscopic magnetoelectric monopolization is developed and its relation to the magnetoelectric response is given[1]. Using first-principles calculations, we use two different strategies to calculate the monopolization: (i) By using a multipole expansion of the magnetization density in atomic spheres around magnetic sites, and (ii) by using a formalism inspired by the modern theory of electric polarization. As an example, results for a series of lithium transition metal compounds LiMPO4 (M = Co, Fe, Mn, Ni) are shown, which can show ferromonopolar and antiferromonopolar ordering.

Anatase TiO2 is employed in a variety of fields, such as (photo)catalysis, sensors and solar cells. Many of the
proposed applications rely on charge transport phenomena, and a deep understanding of the anatase
electronic properties is therefore crucial. In particular, the behavior of excess electrons in the system is still a
matter of debate. While some studies describe highly localized states, small polarons, experiments
report high charge mobility, more compatible with a less localized nature of the excess electrons, sometimes
called large polarons.
Electrons in Anatase and Rutile display different properties. In this work, the
properties of excess electrons in anatase are obtained from hybrid DFT and RPA calculations, to shed light
on the geometry and the stability difference between localized and delocalized electronic states with state-of-
the-art electronic structure methods.
In anatase, we find that the polaronic state, which localizes on a Ti site, induces a long range lattice
relaxation in the [100] and [010] directions. This distortion extends for almost 10 Å in the [100] and [010]
directions. To fully accommodate this relaxation and to yield realistic results, calculations must therefore
employ supercells of at least 4x4x1 unit cells. Hybrid density functionals predict energy differences between
localized and delocalized electrons (ΔEloc-deloc) that strongly depend on the amount of Hartree-Fock
exchange (%HFX) employed. When the %HFX is tuned such that the fundamental band is well described,
the delocalized electronic state is more stable by 0.3 eV. This picture qualitatively changes if the hybrid DFT
orbitals and eigenvalues are used as an input for RPA calculations. RPA results for Anatase show that the
stability of the localized and delocalized states becomes very similar, slightly favoring the localized state, in
agreement with experiment. Moreover, we show that ΔEloc-deloc as obtained from the RPA calculations is
distinctly less sensitive on %HFX used in the initial step of the RPA calculation.

The work is focused on assessment of ADS fuel matrix materials (i.e. MgO and Mo) behavior in chloride (LiCl-KCl) and fluoride (LiF-AlF3) melts. It summarizes available data on Mo speciation in molten chloride systems. These data will be used for theoretical modeling of chemical behavior of Mo and Mg.
For the investigation of the molybdenum behavior in molten salts different modelling approaches are being used: Thermodynamic modeling (TD), molecular dynamic modeling (MD) and DFT. At this stage Molecular Dynamics is used for the acquisition of the missing data on the structural, kinetic and thermodynamic properties of the KCl-LiCl melts and Mo behavior in KCl-LiCl melt. Preliminary MD simulations demonstrated tendency to increase the number density of Cl atoms near the Mo atoms. At the same time number density of K atoms tend to decrease in the neighborhood of Mo atoms. Currently Molecular Dynamics is also used for the direct simulation of Mo dissolution in LiCl-KCl melt. First results of MD calculations are presented and discussed.

Molecule substrate registry on h-BN supported by Rh(111) and other metallic surfaces

The investigation of properties and processes at complex interfaces between metallic substrates and adsorbed molecular systems requires the design of reliable models and efficient computational tools.
Working in close collaboration with experimentalists, we are constantly challenged to reproduce and/or interpret the observed behaviours. The final goal is to acquire in depth knowledge of the studied systems such to lead to the development of new materials with tailored properties.
Modern nano-templates based on hexagonal boron nitride or graphene grown on transition metals show potential for future applications, due to their outstanding mechanical, thermal and electronic properties. The mismatch between the lattice constant of the sp2 overlayer and the substrate produces modulated structures, which act as nano-templates for self-assembly, electron confinement, or intercalation.
We apply scanning tunnelling microscopy (STM) and density functional theory to investigate the adsorption of molecules and the formation and dynamics of defects.
In particular, the site-selectivity of h-BN/Rh(111) (nanomesh) for the adsorption of hexaiodo-cyclo-hexaphenylene (I6-CHP) and H2-phthalocyanine is discussed. In both cases, we observe the preferential absorption within the pore of the nanomesh and the preferential orientation with respect to the substrate. Advanced sampling techniques and tuned analysis tools lead to a better understanding of the interaction between adsorbate and substrate, which could be exploited in the development of new structure and process, as the production of graphene derivatives on metal supported insulators.

Recently, we introduced a novel class of functionals that impose a generalized Koopmans’ condition into DFT [1]. These functionals aim at restoring the piece-wise linear behavior of the total energy as a function of fractional number of particles and provide accurate predictions for the ionization potentials and electron affinities of molecules, in close comparison with both experiments and results from many-body perturbation theory (GW). In particular, through a convenient approach for simulating ultraviolet photoemission spectra (UPS), we find that UPS computed with these functionals are in remarkable agreement with experimental results. In addition, this approach allows us to interpret the outcome of orbital tomography obtained within angular-resolved photoemission spectroscopy (ARPES) techniques.

Domain walls in ferroelectric oxides are quasi-two-dimensional objects separating areas of the crystal with two different directions of polarization. Recently, there has been increasing interest in these interfaces because they display very unique structural and electronic properties which differ significantly from bulk crystalline behavior. We report the presence of a strong Bloch polarization component at 180 degree ferroelectric domain walls in PbTiO3 and tetragonal lead zirconate titanate. This newly discovered 2-dimensional ferroelectric phase is bistable and can be switched from left-handed chiral state to a right-handed chiral state. The strength of polarization and barrier energy for switching can be tuned by the composition of PZT. These type of domain walls show an extremely high piezoelectric coefficient (e33) of 20 C/m2 in the case of PbTiO3

Polarization rotation in ferroelectric walls in lead titanate and PZT
(Abstract)

Presenting author

Anand Chandrasekaran (EPFL)

Co-authors

Dragan Damjanovic (EPFL), Nava Setter (EPFL), Nicola Marzari (EPFL)

Poster #

MAT-17

Pushing the limits of the full potential all-electron quantum simulations.

Full-potential linearized augmented plane-wave (FP-LAPW) method is considered to be the gold standard among the all-electron methods for the electronic structure calculations. Like most ‘band methods’ LAPW uses a finite energy-independent basis to convert the second-order differential equation problem into the generalized eigen-value problem. The LAPW method is implemented in freely available codes (Exciting, Elk, FLEUR) as well as in commercial code (WIEN2k) and several ‘in house’ codes. In this work we present an implementation of the LAPW method on distributed hybrid CPU-GPU systems that allows us to turnaround highly accurate 1000+ atom all-electron quantum materials simulations on clusters with a few hundred nodes. Key to our implementation is a novel algorithm to solve the generalized eigenvalue problem for complex symmetric dense matrices on distributed multi-threaded systems that have a hybrid node architecture. This new implementation makes possible the use of extreme-scale quantum simulations in materials search and design problems as they appear in the materials genome project.

Pushing the limits of the full potential all-electron quantum simulations.
(Abstract)

Theoretical and experimental investigation of PdGa surfaces and their catalytic properties

PdGa is an intermetallic (IMC) compound that has shown remarkable catalytic properties for an important reaction in polyethylene production, namely the partial hydrogenation of acetylene to ethylene [1]. IMCs are amongst the most ingenious and innovative catalyst materials as they enable spatial separation of the catalytically active sites.
In order to explore the catalytic properties of PdGa, namely high stability, activity and selectivity, we adopt a combined theoretical and experimental approach. As experimental methods we use low-energy electron diffraction (LEED) and high-resolution scanning tunneling microscopy (STM). For theoretical investigation we use DFT based large scale ab initio calculations with the cp2k code [2].
Since the hydrogenation reaction happens at the surface of the catalyst, our first challenge is determining which particular surface terminations of PdGa are most likely involved in the catalytic process. To have well defined experimental and simulation conditions we consider single crystals.
In a preliminary work [3], the most stable terminations of PdGa(111) and PdGa(-1-1-1) surfaces were determined in our laboratory combining LEED method, high-resolution scanning tunneling microscopy and ab initio thermodynamics calculations.
To have a more detailed picture of the possible scenario for acetylene hydrogenation we are currently applying the same joint experimental and theoretical strategy to other PdGa terminations (e.g. PdGa(210)).
In parallel, by means of the Nudged Elastic Band method, we unravel the atomistic details of the aforementioned catalytic reaction on such different surfaces.
A next challenge will be to investigate the ability of PdGa surfaces in chiral recognition by adsorption (again through experiment and in silico) of prochiral molecules on PdGa(111) and PdGa(-1-1-1) surfaces.

Hydrogen production gains importance due to the energy demand of the world. It can be accomplished by photo-catalysis that converts energy of sunlight into H2 by reducing H2O.[1] Among tested photo-catalysts for water reduction, TiO2 appears to be a promising one because of its ease of preparation and stability.[2] However, TiO2 has to be modified by adding photo-sensitizers (to harvest UV photons) and/or metal centers (for the reduction of water) to decrease its large band gap (~3.1 eV).[3]
In this study, we aim to theoretically design an efficient TiO2-based photo-catalyst for hydrogen production by water reduction. To modify the band gap of TiO2 and increase its photo-catalytic activity, pyridine-based molecules [4] are used as photo-sensitizers and cobalt and nickel atoms as metal centers. All calculations are carried out by employing density functional theory (DFT) as implemented in CP2K/QUICKSTEP package.[5] The factors playing an important role in designing water reduction photo-catalysts such as preferential adsorption sites of photo-sensitizer and metal centers on the TiO2 surface, the mechanism of water reduction, possible intermediate products and energy barrier for hydrogen production are going to be discussed.

Physics

A study of the cuprates with DCA+

The accurate prediction of the superconducting transition temperature in the high-Tc copper-based superconductors has been an outstanding problem in condensed matter physics for over 30 years. Recently we developed an extension to the Dynamical Cluster Approximation algorithm, i.e. the DCA+, allowing us to compute the superconducting transition temperature of the single-band Hubbard model with unprecedented accuracy and speed. Within the well-known context of LDA+DMFT, it is thought that the physics of the high-Tc copper-based superconductors can be captured well with a 3-band Hubbard model and we will present how the new DCA+ algorithm can be applied to the high-Tc copper-based superconductors.

The interface between the atmospheric pressure plasma ion source and the high vacuum mass spectrometer is one of the most important parts of an inductively coupled plasma mass spectrometer (ICP-MS). It has impact on the efficiency of the mass transfer into the mass spectrometer and it contributes to the formation of interfering ions and to mass discrimination.
Mass discrimination is of particular interest, because the analytical performance of such instruments, especially in the context of isotopic analysis, is strongly hampered by it. Even though earlier studies provide experimental proof of the impact of the cone shape on mass discrimination, the underlying phenomena are widely unknown.
Processes occurring in the ICP-MS interface were simulated using the Direct Simulation Monte-Carlo (DSMC) method with respect to the formation of shock waves, mass transport and mass discrimination. In an attempt to develop a new cone design, the available geometries were compared to establish a baseline and benchmark the modeling results to the experiments. The results of this benchmark are presented here.

We present direct numerical simulations (DNS) of the laminar-turbulent transition which occurs in the three-dimensional swept Hiemenz boundary layer. It is a model for the flow along the leading-edge of swept airplane wings, which is prone to early transition to turbulence at Reynolds numbers below the linear critical value. This instability leads to increased wing drag and fuel consumption.

The DNS are carried out using our in-house Navier-Stokes solver IMPACT which employs high-order finite differences in space, a semi-implicit or explicit multi-step time integration scheme and convective outflow boundary conditions.

For our DNS the boundary layer flow is disturbed by a pair of steady counter-rotating vortices with an unsteady secondary disturbance at subcritical Reynolds numbers. The vortices lead to the formation of a high-speed streak inside the boundary layer by non-modal energy transfer. The interaction of the secondary disturbance with this streak leads to a bypass transition. The dependence of this instability on the frequency of the secondary disturbance is investigated and a frequency band is described for which the secondary instability occurs. Homogeneous suction is applied at the wall to stabilize the flow and to delay transition. The influence of suction on the non-modal energy transfer and on the secondary instability are investigated separately. It is demonstrated that suction reduces the non-modal energy transfer but increases the secondary instability for streaks of a given amplitude.

Electrostatic field solver based on an integral form of the quasi-neutrality equation in B-Spline representation.

NEMORB is a gyrokinetic code simulating turbulence in tokamaks plasmas. It accounts for multi-species dynamics, collisions and realistic magnetic equilibrium profiles thanks to an interface with experimental data. Nonetheless, when studying microinstabilities and associated turbulence at the ion time scale, as in the case of the Ion Temperature Gradient (ITG) instability and Trapped Electron Mode (TEM), the present version of NEMORB assumes that the dominant modes in the turbulent spectra have long wavelengths in the direction perpendicular to the magnetic field lines compared to the ion Larmor scale (k⊥ri ≪1). When solving the quasi-neutrality equation, the polarization density contribution from the ions is estimated within this long wavelengths approximation. This approximation is however not valid throughout the plasma in particular in the vicinity of Mode Rational Surfaces (MRSs), i.e. magnetic surfaces of rational safety factor qs = m/n, where fine radial structures develop as a consequence of the non-adiabatic kinetic electron dynamics, as shown in a previous work [J. Dominski, S. Brunner et al. 2012 J. Phys.: Conf. Ser. 401 012006]. We are thus currently implementing a new field solver valid for all wavelengths of the turbulent spectra, following an approach similar to Ref. [A. Mishchenko and A. Könies 2005 Phys. Plasm.12 062305].
We shall present preliminary results of a new electrostatic field solver valid to all orders in k⊥ri, in a simple slab geometry with uniform magnetic field B. The quasi-neutrality equation takes the form of an integral operator applied on the electrostatic field. The system is discretized using a B-Spline representation in one direction perpendicular to B and a Fourier representation in the two other directions. The verification of internal functionnalities are carried out using pFUnit, a Unit testing framework for Fortran with MPI and openMP extensions developped at NASA’s Goddard Space Flight Center [http://sourceforge.net/projects/pfunit/ ]. We developped two versions of the solver: 1) an "all order" and 2) a "long wavelength", where in the latter the long wavelength approximation is made. This polymorphism of the solver is implemented using inheritance in Fortran03 Object Oriented. A comparison between results obtained with
these two versions of the solver is then conducted.

Electrostatic field solver based on an integral form of the quasi-neutrality equation in B-Spline representation.
(Abstract)

Motivated by today's stringent requirements to reduce noise emissions from aircraft jet engines, this numerical study investigates the flow field and the sound radiation of coaxial jet flows with a heated primary stream by using the Large Eddy Simulation methodology. High-order finite-difference schemes are employed for the spatial discretization of the filtered Navier-Stokes equations in combination with an optimized Runge-Kutta method to achieve highly accurate simulation results for jet flow development as well as for the acoustic noise radiation. Strong mixing between the cold and hot fluid along the velocity shear layers and especially in the vicinity of the end of the potential core lead to a pronounced noise radiation at an aft angle of approximately 35°. The turbulent flow field and the associated acoustic radiation are analyzed for various bypass and temperature ratios and significant differences are observed. Instability waves within the flow field are analyzed and radiating contributions are extracted using a Fourier-filtering technique. The instability waves in the inner shear layer are found to produce significant noise sources due to their high convection velocities.

The field of modern cosmology is increasingly facing computational challenges: Some of them are coming from the growing size of datasets due to advancing imaging technology and numerical problems due to the increasing complexity and precision of observations. In order to address those challenges we are optimising the development of distributed and high performance computing software. In our interdisciplinary team, combining astrophysics and computer science, we have developed applications for scaling out the estimation of cosmological parameter on distributed computing systems, for the ultra fast generation of simulated telescope images and for the fast implementation of Monte Carlo Control Loops for the forward modeling of cosmological surveys.

Optimization of the cone design in the ICP mass spectrometer interface region by DSMC

In ICP mass spectrometry the interface region is the crucial part for the mass transfer within the instrument. The ambient pressure argon plasma containing the analytes is guided through two pinholes into different vacuum stages of the mass spectrometer. Those pinholes called sampler cone and skimmer cone play a major role in this process.
Optimization of the cones will lead to a more decent mass transport resulting in even higher sensitivity and less impact of undesired phenomena such as mass discrimination.
Based on recent experiments and simulation done in our group the Direct Simulation Monte-Carlo method was used to evaluate the cone design and modify it to archive a more beneficial argon jet profile in terms of jet expansion, velocity and particle distribution.
The results of the simulations are presented here.

Optimization of the cone design in the ICP mass spectrometer interface region by DSMC
(Abstract)

In the tokamak scrape-off layer (SOL) the magnetic field lines are open, channeling particles and heat onto plasma facing components, and constraining their lifetime. Therefore, safe operation of future fusion reactors requires an understanding of SOL plasma dynamics. Recently, we have gained deep insights into the tokamak SOL dynamics by means of massively-parallel fluid electromagnetic turbulence simulations carried out with the Global Braginskii Solver (GBS) code. GBS is currently capable of performing full-size SOL simulations of medium-size tokamaks such as TCV or Alcator C-Mod.

In the present paper, we emphasize recent numerical developments aimed towards (a) more realistic description of the plasma geometry and (b) simulating larger, reactor-class machines. First, we address the numerical implementation of parallel advection and diffusion operators in finite difference representation. This is a crucial aspect of our computation, as the cost of the simulation can be drastically reduced if the parallel dynamics are discretized adequately taking advantage of the strong anisotropy of the turbulent modes that are aligned to the magnetic field lines. Second, we address the development and implementation of a matrix-free, parallel multigrid solver for the Poisson operator in the vorticity equation. Using this new solver, it is now possible to further parallelize GBS without breaking parallel scalability, an important step towards the realm of 10^4 CPUs.

Finally, we summarize the understanding of SOL turbulance obtained through GBS simulations - for instance, the mechanisms regulating the turbulence level and therefore the SOL width, the turbulent regimes, and the mechanisms driving plasma rotation in this region. The simulated non-linear dynamics have been compared with analytical estimates, which have highlighted the key physics mechanisms at play in the SOL, and with experimental measurements taken in a number of tokamak worldwide, showing good agreement.

Probing the stability of the spin liquid phases in the Heisenberg-Kitaev model using tensor network algorithms

In the present study we address questions regarding the survival of the Kitaev spin liquid phases in the Heisenberg-Kitaev model in the thermodynamic limit. Using iPEPS tensor network Ansatz wavefunctions we obtain the thermodynamic-limit ground-state phase diagram of the Heisenberg-Kitaev model recently proposed by Chaloupka et al. To assess the accuracy of our Ansatz wavefunctions we perform benchmarks against exact results for the Kitaev model and find very good agreement for various observables. We confirm results of the original (24-site) study in finding the existence of 5 different phases: Neel, Stripy, Ferromagnetic, Zig-Zag and Spin liquids. We find finite survival regions for both Kitaev spin liquid phases in the thermodynamic limit and identify the type of the spin liquid to ordered phase transitions. Finally, we provide estimates for all the relevant transition points in the phase diagram.

Probing the stability of the spin liquid phases in the Heisenberg-Kitaev model using tensor network algorithms
(Abstract)

Presenting author

Juan Osorio (ETH Zurich)

Co-authors

Philippe Corboz (ETH Zurich), Matthias Troyer (ETH Zurich)

Poster #

PHY-09

Realistic calculations for the Fractional Quantum Hall Effect.

We study the fractional quantum Hall effect at the filling fraction 5/2 by means of the exact diagonalization. To bring the calculation closer to the experimental conditions we account for the design of the particular experimental sample through the finite-thickness effect. We also consider the Landau level mixing corrections ignored in previous studies yet significant in practice. The phase diagram resulting from these two degrees of freedom is mapped out. We also investigate the possibilities for improving the experimental sample design. For the calculations we use a highly scalable iterative eigensolver that communicates via MPI+OMP and uses all the available memory to speed up. The calculations are mainly performed at the CSCS facilities in Lugano.

SCENIC : a self-consistent tool for the study of ion-cyclotron resonance heating in 3D fusion plasma devices run on a hybrid HPC architecture

Radio frequency (RF) waves generated by ion cyclotron resonance heating (ICRH) have proven to be an efficient heating
source for fusion plasmas. In addition to heating, ICRH is also used as a actuator for controlling undesirable
magnetohydrodynamics events (MHD) such as sawteeth [1-2]. It is therefore important to precisely know the properties of
the ICRH population, both in velocity space and real space. Measurement of the phase space properties of the ICRH
distribution is very limited, thus requiring detailed modelling and simulations. We undertake the modelling task using the
self-consistent numerical model SCENIC. This model includes fixed boundary 3D geometry with full shaping and
anisotropic pressure effects, warm contributions to the dielectric tensor and exact guiding centre orbit effects. It evolves the
equilibrium, wave field and hot particle distribution function iteratively until a self-consistent solution is found [3]. The
SCENIC package includes four numerical codes with appropriate interfacing modules. The anisotropic version of VMEC
(Variational Moments Equilibrum Code) [4] or ANIMEC [5] solves the MHD plasma equilibrium at each iteration. The
TERPSICHORE code transforms this equilibrium into Boozer coordinates for the wave computation. The full-wave code
LEMan (Low-frequency ElectroMagnetic wave propagation) [6-7]computes the RF wave field based on the updated
equilibrium. The VENUS-LEVIS [8] code uses the equilibrium and the RF wave field to advance the hot particle
distribution function in time by solving the guiding center orbit equations and this generates new inputs for VMEC and
LEMan. Interactions with the RF wave and Coulomb collisions with the background plasma are modeled by Monte-Carlo
operators.
For 3D geometry computations, VMEC and TERPSICHORE (parallelised with OpenMP) and LEMan (parallelised with
OpenMP and MPI) require a substantial amount of shared memory, but only a few processors. On the other hand, VENUS-
LEVIS, parallelized with MPI, has to be run on a high number of processors, but a relatively small amount of memory is
required. In order to optimize the resources used by the SCENIC package we demonstrate a hybrid architecture solution.
Two eight-cores nodes with 320GB of shared memory each are employed for the equilibrium and wave codes, while 1024
sixteen-cores nodes on an IBM Blue Gene/Q supercomputer are employed for the particle orbit solver. A shared file system
for the two machines ensure the success of the iterative scheme.

Particle-laden multiphase flows appear in many natural phenomena and industrial applications. Common examples include sedimentation on the seafloor, dust storms, dust explosions, sprays etc. Hence, it is of immense interest to further the understanding of such flows. We conduct highly accurate direct numerical simulations of particle-laden flows and study the detailed physics. Simulations are performed using the in-house developed code "IMPACT" which solves the incompressible Navier-Stokes equations on staggered grids using high-order finite-difference schemes. It is parallelized using 3D domain decomposition in space and is highly optimized for running on massively parallel machines.

In the present work we study the sedimentation of particles on the seafloor by particle-driven gravity currents. A lock-exchange configuration representing such flows is used. In this setup a fixed volume of particle-laden fluid is separated from the clear fluid by a barrier at initial time t=0. At the start of the simulation the barrier is removed giving rise to a turbidity current propagating over the bottom boundary. On using different boundary conditions (periodic and no-slip wall) in the spanwise direction we found substantial differences in the vortical structures generated. In our poster presentation we will demonstrate the effect of spanwise boundary conditions on the dynamics of gravity currents.

The Large Hadron Collider (LHC) at CERN in Geneva is the largest and most powerful
particle collider ever built. It provides humanity with latest knowledge about universe's
fundamental building blocks, it's earliest conditions and dark matter.

The current distributed computing resources used for simulating and processing collision data
collected by the LHC experiments are largely based on dedicated Linux clusters. Job control
and software provisioning mechanisms are quite different from the common concept of
self-contained HPC applications run by particular users on specific HPC systems.

Also, most of the current high-energy physics software is designed to use one CPU core per
computing job, assuming that there will be enough RAM available and multiple independent
jobs will run in parallel. Given that modern hybrid HPC systems allocate whole nodes, and
they feature high-performance GPU accelerators, the software must be adapted to accommodate
this. Recent developments allowed limited parallel processing in some in order to
efficiently use all the CPU cores of single nodes with a single job.

This poster reports on the progress of the preparatory/development project to enable
event generation and detector simulation jobs from the LHC ATLAS experiment to run on
HPC systems provided by CSCS, in particular the Swiss flag-ship supercomputer Piz
Daint. Results and observations from test runs on the Todi system and the integration into
the worldwide ATLAS computing grid are presented. Plans for introducing parallelism and
GPU code into parts of the ATLAS software stack are discussed.

Computer Sciences / Math

A scheduling framework for online data processing

ALICE (A Large Ion Collider Experiment) is a heavy-ion detector studying the physics of strongly interacting matter and the quark-gluon plasma at the CERN LHC (Large Hadron Collider). The ALICE detector will be upgraded in 2018 to cope with a 50kHz interaction rate in Pb-Pb collisions, producing in the online computing system a sustained throughput of 1 TB/s. This data will be processed and reduced on the fly (event building, calibration and reconstruction) so that the stream to permanent storage does not excess 80GB/s peak and 20GB/s average, the raw data being discarded. This paper proposes a scheduling framework for the data processing, as part of the ALICE O2 project.

Detector data are received from optical links on a first set of 250 First Level Processors (FLPs). FLPs receive data fragments (i.e. associated to a given sub-detector), and performs locally the reduction tasks which do not require the full event (e.g. cluster finding). The processed sub-events are grouped in time frames (0.1s each) and then transferred to a second cluster, consisting of 1250 Event Processing Nodes (EPNs). All fragments of a given time interval (i.e. up to 250 chunks, 1 per FLP, 25GB in total) have to reach the same EPN so that a full event is built and reconstructed.

Given the high data rate, it is a challenging task to quickly select computing nodes inside the EPN cluster and to delegate a large number of jobs from FLPs. The delay due to the jobs scheduling may cause severe processing bottleneck, and provoke side effects like increasing the buffer space needed on FLPs.
In order to solve this scheduling problem, we propose a framework for a dynamic scheduler that can work with both homogeneous and heterogeneous clusters. In addition, the scheduler has to consider multiple parameters including the link quality, distance, network bandwidth, and data locality, as well as able to handle the unexpected incidents like a link failure. The objectives of the scheduler are to distribute all tasks with the minimum makespan, as well as minimizing the cost of execution (in terms of energy usage).

The scheduling framework will be built based on two classes of algorithms: the Particle Swarm Optimization (PSO) and Local Minimum Completion Time Swap (LMCTS). The quality of the generated schedules as well as the execution speed will be compared. We will also compare the results with a standard approach called HEFT (Heterogeneous Earliest Finish Time) and the simplest approach currently considered for prototyping, namely Round-Robin (sequential distribution in a semi-static list of nodes). In addition, the algorithm complexities of particle hits grouping, data reduction, and reconstruction algorithms will be evaluated. The execution time of tasks on specific type of computers can then be accurately estimated by regressing the complexity models. The multiple regression models, which describe the relationship among execution time, algorithm complexities, and the type of computing units, will be used by the scheduler to predict the execution time of each task. These estimated run time will be used to create appropriate schedules using PSO and LMCTS. The results of this research work will be integrated into the ALICE O2 computing framework with the goal of efficiently scheduling tasks to process the experiment data in real time.

Numerical simulation is a crucial part of science and industry in Europe. The advancement of simulation as a discipline relies on increasingly compute intensive models that require more computational resources to run. This is the driver for the evolution to exascale. Due to limits in the increase in single processor performance, exascale machines will rely on massive parallelism on and off chip, with a complex hierarchy of resources. The large number of components and the machine complexity introduce severe problems for reliability and programmability.

The EU EXA2CT project brings together experts at the cutting edge of the development of solvers, related algorithmic techniques, and HPC software architects for programming models and communication. It will take a revolutionary approach to exascale solvers and programming models, rather than an incremental approach.

We will produce modular open source proto-applications that demonstrate the algorithms and programming techniques developed in the project, to help boot-strap the creation of genuine exascale codes.

We present the coupling of two different discretization methods for Fluid-Structure-Interaction using a monolithic approach. Usually for the coupling of two different discretization methods a partitioned coupling scheme is used.
For the simulation of structural mechanics the finite element method is one of the most used, if not the most popular discretization method. In fluid dynamics the finite volume method is a widely used versatile method.
Therefore the coupling of finite element and finite volume methods for Fluid-Structure-Interaction is of substantial interest. The crucial challenge in coupling fluid and structure is the transfer of forces and displacements at the fluid structure interface. This coupling has to be carried out in a stable and effcient way. We combine the two different discretizations in a monolithic formulation, i.e. the transfer of forces and displacements is fulfilled in a common set of equations for fluid and structure, which is solved simultaneously.
We model the fluid by means of the incompressible Navier-Stokes equation in an Arbitrary- Eulerian-Lagrangian formulation, while we use a pure Lagrangian formulation for the
structure. This leads to an additional quantity to be coupled, a domain velocity for the fluid domain, which we can handle implicitly or explicitly in our simulation framework.
For the discretization in time we use different implicit first and second order time stepping schemes.
For complex real world simulation domains, we end up in algebraic systems with a large number of degrees of freedom, which makes the use of parallel solvers mandatory. Here, a good choice for an efficient preconditioning strategy is important. Our solver bases on the application of Newtons method using iterative solvers within different multi-level methods. The multi-level methods are constructed on a hierarchy of unstructured nested meshes.
We discuss both, the coupling approach of the two different discretizations as well as the efficient solution of the arising large nonlinear system.

Multigrid preconditioner for saddle point problems with highly variable coefficients

Geodynamic problems involves the solution of problems characterized by strong variations of the coefficients. A lot of effort has been devoted to the development of good preconditioning techniques for the above mentioned problems. In this work we are going to study the preconditioning technique proposed by Arnold, Falk and Winther
along with a novel formulation of the Stokes problem.
The results show the effectiveness and robustness of the advocated preconditioning techniques.

It is expected that within the next decade, supercomputers will be capable of performing of the order of 10^18 floating point operations per second. In comparison to today's supercomputers this corresponds to an increase in performance by a factor of the order of 100. Parallelism in time is a welcome addition to parallelism in space for exploiting concurrency. Using hundreds of thousands of cores on the fastest machines at the Juelich Supercomputing Centre in Germany, it has been shown that such an inclusion of parallelism in time can lead to additional speedup well beyond the saturation of a purely space-parallel approach. In this poster we report on benchmark simulations for time-parallel methods. Also, we describe progress made in the development of a C++ library for the parallel in time algorithm Parareal. Finally, we outline an application of Parareal to simulations of black hole formation.

We present a mixed explicit implicit time stepping scheme for solving the advection equation on Cartesian embedded boundary meshes. Cartesian embedded boundary meshes, also referred to as cut cell methods, cut an object out of the Cartesian background grid, resulting in potentially arbitrarily small cut cells around the object. Our goal is to extend a well-established projection method (based on the Bell-Colella-Glaz algorithm) for solving the incompressible Euler equations from Cartesian grids to cut cell grids, in order to handle more complicated geometry. The projection algorithm consists of two steps: the update of the velocity field without enforcing the incompressibility condition, and the projection step. In the first step, which uses a MUSCL scheme, an advection-type equation needs to be solved. Therefore, an important intermediate step in our project is to develop a new algorithm for solving the advection equation on the cut cell grid. his is the focus of this contribution.

Here, the main problem is to overcome the small cell problem: we would like to take a time step based on the size of the Cartesian cells. For standard finite volume schemes, however, this time step is not stable on the potentially arbitrarily small cut cells. We therefore use an implicit time stepping on cut cells. On Cartesian cells away from the boundary, we use an explicit scheme to keep the cost low. We compare various ways of coupling the explicit and implicit scheme and find “flux bounding”, for which we can prove a TVD result, to be the most suitable. We present numerical results in one and two dimensions. These results show second-order accuracy in the $L^1$ norm and between first- and second-order accuracy in the $L^{\infty}$ norm.

Benchmarks for an implementation of Parareal in the C++ domain specific embedded language STELLA

For the path towards exascale computing it becomes increasingly relevant to be able to exploit efficiently the massive parallelization allowed for by systems with tens or hundreds millions of cores. This requires efforts to design algorithms and applications that apply parallelization at many levels, even for small problems. For time-dependent partial differential equations, time-parallel methods that offer concurrency along the temporal axis have been shown to be promising. A popular method of this type is Parareal, introduced by Lions, Maday and Turinici in 2001.
While mathematical properties of Parareal are well studied, much less is known about how different implementation strategies affect Parareal's performance in parallel computation.

In this study we solve a three-dimensional advection-diffusion problem by making use of an implementation of Parareal using the parallel library STELLA for the evaluation of finite differences stencils. STELLA provides backends for both CPUs and GPUs. We provide performance results of the code on Europe's most powerful supercomputer Piz Daint at the Swiss National Supercomputing Centre CSCS and compare speedup and energy consumption for both the CPU and GPU versions.

Benchmarks for an implementation of Parareal in the C++ domain specific embedded language STELLA
(Abstract)

Many problems in computational science and engineering involve elliptic partial differential equations and require the numerical solution of large, sparse (non)linear systems of equations. Multigrid is known to be one of the most efficient methods for this purpose.
However, the concrete multigrid algorithm and its implementation highly depend on the underlying problem and hardware. Therefore, changes in the code or many different variants are necessary to cover all relevant cases. In our talk we discuss a prototype implementation in Scala for a framework that allows abstract descriptions of PDEs, their discretization, and their numerical solution via multigrid algorithms. From these, the generation of data structures and multigrid components required to solve elliptic PDEs on structured grids is possible. Furthermore, a hybrid parallelization using OpenMP and MPI is available.
We present our approach for partitioning the computational domain, details about the employed parallelization concepts, benefits and drawbacks of
generating data structures and code for data exchanges as well as planned extensions. As show cases, the automatic generation of multigrid solvers for two different test problems stemming from image processing and quantum chemistry is discussed. Our generator is capable of producing code that runs on different platforms and shows excellent weak scaling performance on the JuQueen HPC cluster.

FEM is one of the most powerful analysis tools in CAE, and proportional improvements to its computational performance is always demanded with the continuous development of new computers. Computational performance of FEM is decided, on the premise that large-scale calculations are performed, by the performance of the linear equation solver, especially the Krylov iterative method. One of the features of FEM is the use of unstructured data. Within it, preconditioning and SpMV are the hot spots. Unlike the structured data stencil, wall-clock time is bounded by the memory bandwidth. For example, in a typical structural analysis problem the BPF (Byte per Flop) that an algorithm requires is about 5.7, while the effective BPF that the hardware of the K-Computer has is 0.36. Thus, leveraging GPUs is the key to such memory-bound algorithms.

As for hardware in recent years, both functional unit and memory have hierarchical configuration. GPU as an accelerator has also gathered attention. These tendencies are common from the supercomputer to the COTS environments, and will continue. Although BLAS, which has been maintained since the 1980s, is optimized to various CPUs, it is necessary to distill a heterogeneous configuration layer now.

The authors have so far been engaged in development of the parallel FE library called ppOpen-APPL/FEM, and structural analysis code FrontISTR++, which leverage it. FrontISTR++ employs MPI parallelization based on overlapping domain decomposition for between-node communication and OpenMP parallelization for loop subdivision among cores in a node. Of example; for Flat-MPI computation on Fujitsu PRIMERGY BX900 the strong-scale parallel efficiency using 1,152 cores was 84%, while almost the ideal weak-scale parallel efficiency was obtained when using 25,000 cores. However, in order to effectively use heterogeneous hardware environments containing accelerators, such as GPUs, the examination and implementation of algorithms considering the properties of the above-mentioned hardware is still needed.

In this study, performance of SpMV for the unstructured grid on heterogeneous hardware environments and its implementation is presented. They are more specifically the following two points:
(1) Iterative refinement with mixed precision in the FE Krylov solver, and the automatic technique for setting the convergence criteria of the inner iteration.
(2) Block size optimization and communication hiding of SpMV in the FE Krylov solver on a CPU-GPU system.
Here, the highest performance of a computer can be demonstrated with an accelerator, and the above are positioned as the crucial technologies for obtaining the performance.

Olav Aanes Fagerlund (The University of Tokyo, Japan), Takeshi Kitayama (The University of Tokyo, JST/CREST, Japan), Gaku Hashimoto (The University of Tokyo), Serban Georgescu (Fujitsu Laboratory Europe Ltd., UK)

Poster #

CSMATH-09

Parallel solver for the space in homogeneous and time dependent Boltzmann equation

We present a high-performance implementation for the solution of the space in homogeneous and time dependent Boltzmann equation. The phase space is discretized using finite elements for the physical domain and a polar spectral discretization based on Laguerre polynomials in velocity. Computations are done 2+2+1 dimensions with an implicit/explicit split time stepping scheme on unstructured meshes. The polar spectral scheme can be made fully conservative and requires no truncation of the collision operator.

Parallel solver for the space in homogeneous and time dependent Boltzmann equation
(Abstract)