Science DMZ Network Architecture

The term Science DMZ refers to a computer subnetwork that is structured to be secure, but without the performance limits that would otherwise result from passing data through a stateful firewall.[1][2] The Science DMZ is designed to handle high volume data transfers, typical with scientific and high-performance computing, by creating a special DMZ to accommodate those transfers.[3] It is typically deployed at or near the local network perimeter, and is optimized for a moderate number of high-speed flows, rather than for general-purpose business systems or enterprise computing.[4]

The term Science DMZ was coined by collaborators at the US Department of Energy's ESnet in 2010.[5]
A number of universities and laboratories have deployed or are deploying a Science DMZ. In 2012 the National Science Foundation funded the creation or improvement of Science DMZs on several university campuses in the United
States.[6][7][8]

The Science DMZ[9]
is a network architecture to support Big Data. The so-called information explosion has been discussed since the mid 1960s, and more recently the term data deluge[10] has been used to describe the exponential growth in many types of data sets. These huge data sets, often need to be copied from one location to another using the Internet. The movement of data sets of this magnitude in a reasonable amount of time should be possible on modern networks. For example, it should only take less than 4 hours to transfer 10 TeraBytes of data on a 10 Gigabit Ethernet network path, assuming disk performance is adequate[11] The problem is that this requires networks that are free from packet loss and middleboxes such as traffic shapers or firewalls that slow network performance.

Contents

Most businesses and other institutions use a firewall to protect their internal network from malicious attacks originating from outside. All traffic between the internal network and the external Internet must pass through a firewall, which discards traffic likely to be harmful.

A stateful firewall tracks the state of each logical connection passing through it, and rejects data packets inappropriate for the state of the connection. For example, a website would not be allowed to send a page to a computer on the internal network, unless the computer had requested it. This requires a firewall to keep track of the pages recently requested, and match requests with responses.

A firewall must also analyze network traffic in much more detail, compared to other networking components, such as routers and switches. Routers only have to deal with the network layer, but firewalls must also process the transport and application layers as well. All this additional processing takes time, and limits network throughput. While routers and most other networking components can handle speeds of 100 billion bits per second (Gbps), firewalls limit traffic to about 1 Gbit/s, which is unacceptable for passing large amounts of scientific data.

While stateful firewall may be necessary for critical business data, such as financial records, credit cards, employment data, student grades, trade secrets, etc., science data requires less protection, because copies usually exists in multiple locations and there is less economic incentive to tamper.[4]

Diagram of a conventional DMZ that would be used by a business. All traffic to the DMZ must pass through a firewall, which limits throughput.

A firewall must restrict access to the internal network but allow external access to services offered to the public, such as web servers on the internal network. This is usually accomplished by creating a separate internal network called a DMZ, a play on the term “demilitarized zone." External devices are allowed to access devices in the DMZ. Devices in the DMZ are usually maintained more carefully to reduce their vulnerability to malware. Hardened devices are sometimes called bastion hosts.

The Science DMZ takes the DMZ idea one step farther, by moving high performance computing into its own DMZ.[12]
Specially configured routers pass science data directly to or from designated devices on an internal network, thereby creating a virtual DMZ. Security is maintained by setting access control lists (ACLs) in the routers to only allow traffic to/from particular sources and destinations. Security is further enhanced by using an intrusion detection system (IDS) to monitor traffic, and look for indications of attack. When an attack is detected, the IDS can automatically update router tables, resulting in what some call a Remotely Triggered BlackHole (RTBH).[1]

The Science DMZ provides a well-configured location for the networking, systems, and security infrastructure that supports high-performance data movement. In data-intensive science environments, data sets have outgrown portable media, and the default configurations used by many equipment and software vendors are inadequate for high performance applications. The components of the Science DMZ are specifically configured to support high performance applications, and to facilitate the rapid diagnosis of performance problems.
Without the deployment of dedicated infrastructure, it is often impossible to achieve acceptable performance.
Simply increasing network bandwidth is usually not good enough, as performance problems are caused by many factors, ranging from underpowered firewalls to dirty fiber optics to untuned operating systems.

The Science DMZ is the codification of a set of shared best practices—concepts that have been developed over the years—from the scientific networking and systems community. The Science DMZ model describes the essential components of high-performance data transfer infrastructure in a way that is accessible to non-experts and scalable across any size of institution or experiment.

^Dart, Eli; Metzger, Joe (June 13, 2011). "The Science DMZ". CERN LHCOPN/LHCONE workshop. Retrieved 2013-05-26. This is the earliest cite-able reference to the Science DMZ. Work on the concept had been going on for several years prior to this.