3 Context In-situ visualization is one action item in our HP2C efforts The HP2C platform aims at developing applications to run at scale and make efficient use of the next generation of supercomputers. Presently this will be the generation of computing technologies available in 2013 timeframe.

4 Motivations Parallel simulations are now ubiquitous The mesh size and number of time steps are of unprecedented size The traditional post-processing model computestore-analyze does not scale because I/O to disks is the slowest component Consequences: Datasets are often under-sampled on disks Many time steps are never archived It takes a supercomputer to re-load and visualize supercomputer data

5 Visualization at Supercomputing Centers, IEEE CG&A Jan/Feb 2011 As we move from petascale to exascale., favor the use of the supercomputer instead of a vis. cluster Memory must be proportional to size and is $$$$$ Aggregate flops instead of a parasitic expense Close coupling I/O But, Little memory per core

6 Solving a PDE and visualizing the execution Full source code solution is given here: ManualExamples/Simulations/contrib/pjacobi/

12 Strategies for load balancing (static) A checkerboard pattern seem like it would give a good compromise, but the communication pattern is more complex to program Grid splitting strategies will also affect: boundary conditions I/O patterns in-situ ghost regions

15 Example of the use of ghost-cells Ghost- or halo-cells are owned by processor P, but reside in the region covered by processor P+k (k is an offset in the MPI grid) Their purpose is to guarantee data continuity

16 Example of the use of ghost-cells Ghost- or halo-cells are usually not saved in solution files because the overhead can be quite high. A restart usually involves reading the normal grid, and communicate (initialize) ghost-cells before proceeding.

17 in-situ (parallel) visualization Could I instrument parallel simulations to communicate to a subsidiary visualization application/driver? Eliminate I/O to and from disks Use all my grid data with ghost-cells Have access to all time steps, all variables Use my parallel compute nodes Don t invest into building a GUI, a new visualization package, or a parallel I/O module

18 Two complementary approaches Parallel Data transfer to Distributed Shared Memory Computation and Post-processing physically separated developed at CSCS, publicly available on HPCforge or, Use ADIOS (not described here, google ADIOS NCCS ) Co-processing or in-situ Computation and Visualization on the same nodes A ParaView project A VisIt project See short article at EPFL Tutorial at PRACE Winter School Tutorial proposed at ISC 11 18

19 First approach: Parallel Data transfer to Distributed Shared Memory Live post-processing and steering: Get analysis results whilst the simulation is still running Re-use generated data for further analysis/compute operations Main objectives: Minimize modification of existing simulation codes

23 Data write rate (GB/s) Write test with a 20GB DSM distributed among 8 post-processing nodes the XT Saturation of the network Parallel File System (Lustre) H5FDdsm (Socket over SeaStar2+ Interconnect) Number of PE used for writing to the DSM

25 Advantages Parallel real-time post-processing of the simulation Easy to use if application uses HDF5 already Do not need to have a big amount of memory on compute nodes since everything is sent to a remote DSM Not all post-processing apps scale as well as the simulation Run PV on fat nodes and simulation on a very large number of nodes

27 Use VisIt The Simulation s window shows meta-data about the running code Control commands exposed by the code are available here All of VisIt s existing functionality is accessible Users select simulations to open as if they were files

33 Some details on the APIs The C and Fortran interfaces for using SimV2 are identical, apart from calling different function names The VisIt Simulation API has just a few functions set up the environment open a socket and start listening process a VisIt command set the control callback routines The VisIt Data API has just a few callbacks GetMetaData() GetMesh() GetScalar(), etc

42 How much impact in the source code? The best suited simulations are those allocating large (contiguous) memory arrays to store mesh connectivity, and variables Memory pointers are used, and the simulation (or the visualization) can be assigned the responsibility to deallocate the memory when done. F90 example: allocate ( v(0:m+1,0:mp+1) ) visitvardatasetd(h, VISIT_OWNER_SIM, 1, (m+2)*(mp+2), v)

44 How much impact in the source code? When data points are spread across many objects, there must be a new memory allocation and a gathering done before passing the data to the Vis Engine REAL, DIMENSION(:), ALLOCATABLE :: cx ALLOCATE( cx(numnodes), stat=ierr) DO ielem = 1, numelems+numhalos DO i = 1, 3 cx(elementlist(ielem)%lclnodeids(i)) = ElementList(iElem)%x(i) END DO END DO err = visitvardatasetf(x, VISIT_OWNER_COPY, 1, numnodes, cx)

45 VisIt can control the running simulation Connect and disconnect at any time while the simulation is running We program some buttons to react to the user s input: Halt, Step, Run, Update, others

47 Domain distributed over multiple MPI tasks If graphics is very heavy, it is done remotely (by the simulation), and sent over to the client as a pixmap. If light, geometry is sent to client for local rendering

49 Advantages compared to saving files The greatest bottleneck (disk I/O) is eliminated Not restricted by limitations of any file format No need to reconstruct ghost-cells from archived data All time steps are potentially accessible All problem variables can be visualized Internal data arrays can be exposed or used Step-by-step execution will help you debug your code and your communication patterns The simulation can watch for a particular event and trigger the update of the VisIt plots

51 We are now adding a new interaction paradigm mega-, giga-, peta-, exa-scale simulations can now be coupled with visualization Serve a Visualization Request? Check for convergence or end-of-loop Solve Next Step

53 We need a new data analysis infrastrcuture Domain decomposition optimized for simulation is often unsuitable for parallel visualization To optimize memory usage, we must share the same data structures between simulation code and visualization code to avoid data replication Create a new vis infrastructure, develop in-situ data encoding algorithms, indexing methods, incremental 4D feature extraction and tracking Petascale visualization tools may soon need to exploit new parallel paradigms in hardware, such as multiple cores, multiple GPUs, cell processors

54 Conclusion Parallel visualization is a mature technology, but was optimized as a stand-alone process. It can run like a supercomputer simulation, but is also limited by I/O. In-situ visualization is an attractive strategy to mitigate this problem, but will require an even stronger collaboration between the application scientists and the visualization scientist, and the development of a new family of visualization algorithms Demonstrations

Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or

Get the answers you need from your data. IDL is the preferred computing environment for understanding complex data through interactive visualization and analysis. IDL Powerful visualization. Interactive

Visualization of Adaptive Mesh Refinement Data with VisIt Gunther H. Weber Lawrence Berkeley National Laboratory VisIt Richly featured visualization and analysis tool for large data sets Built for five

Kriterien für ein PetaFlop System Rainer Keller, HLRS :: :: :: Context: Organizational HLRS is one of the three national supercomputing centers in Germany. The national supercomputing centers are working

School of Systems Engineering A Novel Cloud Based Elastic Framework for Big Data Preprocessing Omer Dawelbeit and Rachel McCrindle October 21, 2014 University of Reading 2008 www.reading.ac.uk Overview

I have had my results for a long time, but I do not yet know how I am to arrive at them. Carl Friedrich Gauss, 1777-1855 DIY Parallel Data Analysis APTESC Talk 8/6/13 Image courtesy pigtimes.com Tom Peterka

16 Example of Standard API System Call Implementation Typically, a number associated with each system call System call interface maintains a table indexed according to these numbers The system call interface

Cray : Data Virtualization Service Stephen Sugiyama and David Wallace, Cray Inc. ABSTRACT: Cray, the Cray Data Virtualization Service, is a new capability being added to the XT software environment with

Send Orders for Reprints to reprints@benthamscience.ae 1582 The Open Cybernetics & Systemics Journal, 2015, 9, 1582-1586 Open Access The Construction of Seismic and Geological Studies' Cloud Platform Using

Available online at www.prace-ri.eu Partnership for Advanced Computing in Europe Parallel Visualization of Petascale Simulation Results from GROMACS, NAMD and CP2K on IBM Blue Gene/P using VisIt Visualization

WS on Models, Algorithms and Methodologies for Hierarchical Parallelism in new HPC Systems The High Performance Internet of Things: using GVirtuS for gluing cloud computing and ubiquitous connected devices

Software Engineering, Lecture 4 Decomposition into suitable parts Cross cutting concerns Design patterns I will also give an example scenario that you are supposed to analyse and make synthesis from The

Introduction Software Development around a Millisecond Geoffrey Fox In this column we consider software development methodologies with some emphasis on those relevant for large scale scientific computing.

Description Course Summary This course provides students with the knowledge and skills to manage and deploy Microsoft HPC Server 2008 clusters. Objectives At the end of this course, students will be Plan