Facilities Operation & User Support

The
Facilities, Operations and User Support (FOUS) program is responsible for operating
and maintaining the computing systems procured by the Advanced Simulation and Computing (ASC) program, and for
delivering additional computing related services to Defense Program customers
located across the Nuclear Weapons Complex. Sandia has developed a robust User
Support capability which provides various services to analysts, tool
developers, major code groups, and computer science researchers alike.

Because
major computing resources are procured periodically and are not duplicated at
each NNSA laboratory, a highly reliable dedicated Wide Area Network connects the
computing environments at Sandia (both NM and CA), Los Alamos and Lawrence
Livermore national laboratories. The FOUS program maintains local high
performance networks which connect computing and storage resources in the
various security environments needed by our customers, and provides support for
remote access and job submittal to platforms located at other laboratories. These
interconnects require constant observation and analysis as minor changes or
error conditions can drastically alter the performance of data transfer between
the sites.

Facilities, Network, and Power

All of the
resources comprising the high performance computing environment require a building and supporting
cooling and power infrastructure. At Sandia, we are taking new approaches to
energy conservation which plays into the design of new facilities, as well as
the innovative use of existing cooling equipment or power distribution systems.
Sandia has been recognized for several groundbreaking initiatives in the area
of cooling, and power conservation. Our newest collaboration will provide
access to over 2 MW of solar voltaic energy in partnership with our Alternative
Energy research programs. Prior efforts have saved millions of gallons of water
and reduced dependence on refrigerated cooling to lower our energy bill. These
innovations will be leveraged into new facilities in the coming years.

System and Environment Administration and Operations

The Operations
area is where the daily activities of running and managing computing resources
deliver critical support to customers both near and far. Although we do not
generate the results of computational simulation, we are stewards of the
information and work to protect and control access through various access
control mechanisms and workflow processes. In addition, monitoring of the
environment and the specific computing platforms raise early warnings to
provide time for manual intervention. The
Systems Operations Center calls in administrators or technicians to address any
of a number of problems from power to cooling to computer or disk failures, or they
may determine that networking support is required.

User Support

While system
administrators are often engaged to help isolate and correct user detected
errors, the first responders to problem situations are those people engaged in
User Support. The services provided range from documentation of systems to
training on new platforms and new tools such as debuggers or optimizer tools. Every code and simulation is a slightly
different instance of a general environment, and errors may arise in any one of
several layers of that environment, from the hardware level, to the
communications interconnect, to the logic of the code or the interpretations
provided by the compilers. Most large simulations run for days to weeks and
create thousands of files. Managing this complexity takes a thorough
understanding of the codes, the file systems, and the limitations of the
individual computing platform.

Common Computing Environment

As mentioned
above, major computing resources are procured periodically and are not
duplicated at each NNSA laboratory. As a result, common computing tools and
services are required to meet user needs. Sandia works with Los Alamos and Livermore national laboratories to
provide these services.

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525.