ResiliNets: Resilient and Survivable Networks – Overview

Society increasingly relies on computer networks in general, and the Internet in particular. Consumers rely on networks for access to information and services, personal finance, and for communication with others. The Internet has become indispensable to the routine operation of businesses and to the global economy. The military depends on network centric operations and warfare. Governments depend on networks for their daily operation, service delivery, and response to natural disaster and terrorist attacks.

Therefore, the consequences to disruption of the network are increasingly severe, and threaten the lives of individuals, the financial health of business, and the economic stability and security of nations and the world. With the increasing importance of the Internet, so follows it's attractiveness as a target from bad guys: recreational and professional crackers, terrorists, and from information warfare.

We therefore regard resilience and survivability as critical to the future of our network infrastructure. The ResiliNets initiative aims to understand and progress the state of resilience and survivability in computer networks, including the Global Internet, PSTN, SCADA networks, mobile ad-hoc networks, and sensor networks.

09:00–10:00: journal club (reading group) in which a group member leads discussion of a paper

discussion and brainstorming of new research and proposal ideas

weekly status and discussion of individual research projects and papers in progress

ResiliNets meetings are open to all interested people; the best way to find out about and involved in our research is to come to our meetings. We are happy to introduce our work to new participants.

*except when the Americans are out of sync with the rest of the world's summer time, in which case an hour adjustment needs to be made to European times

Disciplines and Related Work

Disciplines Related to Faults and Challenges

Fault Tolerance is the ability of a system to tolerate faults such that service failures do not result. Fault tolerance generally covers random single or at most a few faults, and is thus a subset of survivability, as well as of resilience.

Survivability is the capability of a system to fulfil its mission, in a timely manner, in the presence of threats such as targetted attacks or large-scale natural disasters resulting in many failures, in addition to the few random failures covered by fault tolerance. Survivability is thus a superset of fault tolerance but a subset of resilience.

Disruption Tolerance is the ability of a system to tolerate disruptions in connectivity among its components. Disruption tolerance is a superset of tolerance of the environmental challenges: weak and episodic channel connectivity, mobility, delay tolerance, as well as tolerance of power and energy constraints.

Traffic Tolerance is the ability of a system to tolerate unpredictable offered load without a significant drop in carried load (including congestion collapse), as well as to isolate the effects from from cross traffic, other flows, and other nodes. The traffic can either be unexpected but legitimate such as from a flash crowd, or malicious such as a DDoS attack.

Trustworthiness Disciplines Related to Quantifiable Properties

Dependability is the property of a system such that reliance can justifiably be placed on the service it delivers. It generally includes the measures of availability (ability to use a system or service) and reliability (continuous operation of a system or service), as well as integrity, maintainability, and safety.

Security is the property of a system and measures taken such that it protects itself from unauthorised access or change, subject to policy. Security properties include AAA (auditability, authorisability, authenticity), confidentiality, and nonrepudiality. Security shares with dependability the properties of availability and integrity.

Performability is the property of a system such that it delivers performance required by the service specification, as described by QoS (quality of service) measures.

Trustworthiness with respect to Challenges

Robustness is a property that relates the operation of a control system to perturbations of its inputs. In the context of resilience, robustness describes the trustworthiness (quantifiable behaviour) of a system in the face of challenges.

A rigorous framework to quantify the network resilience on the basis of two orthogonal dimensions of communication networks: the physical network characteristics (operational space) and the service requirements (service space).

Operational space N: represents the physical state of the network
Resilient networks remain in normal operation in the face of challenges

normal operation according to network design and engineering

partially degraded but still operable

severely degraded providing little or no operational capability

Service space P: represents the quality of service for an application over a given network
Resilient services remain acceptable even with network operation degrades

acceptable service with respect to service specification

impaired but usable service

unacceptable service that provides little or no utility

Resilience R: as a function of state transition probability in two-dimensional state-space:

each dimension consists of multi-variate metric descriptor

network state S is discrete set of operational metrics and service parameters

Realistic topology generators are essential to the understanding of network design and survivability analysis. Two important issues that are not sufficiently addressed by current topology generators are node-positioning and cost considerations. The utility of the existing models could be vastly improved by incorporating these two features. This project aims at developing a new network topology generator, which enables node positioning and cost constraints on the topologies generated with several well-known graph generation models. Our approach incorporates network design practices in topology generation, thereby enabling a tool that can be used to generate viable alternate topologies during the network design and engineering phase. Further, we consider the representativeness of the generated topologies using several graphical properties such as degree distribution, shortest path distribution, link length distribution, and spectrum of the graph amongst several others.

An essential aspect of resilient network design is to understand how the networks behave under various challenges. To analyse network resiliency we model the challenges that disrupts the normal operation of network. In order to analyse full set of scenarios, simulation scripts require n networks for c challenges. Our model decouples the c×n input files required for complex simulation scripts, and reducing it to c+n input files, thus any challenge model can be applied to any network topology. This decoupling gains challenge scenario analysis efficiency.

Highly-dynamic mobile-wireless networks present unique challenges to end-to-end communication, particularly caused by the time varying connectivity of high-velocity nodes combined with the unreliability of the wireless communication channel. We are developing a new domain-specific protocol suite for telemetry networks (TmNS) in the aeronautical test environment consisting of: AeroTP TCP-friendly transport protocol, AeroNP IP-compatible network protocol, and AeroRP location-assisted routing protocol. Our research explores the tradeoffs in the location of functionality such as error control and location management for high-velocity multihop airborne sensor networks and presents cross-layer optimizations between the MAC, link, network, and transport layers to enable a domain specific network architecture, which provides high reliability for telemetry applications. Sensor data is returned multihop from airborne test articles (TA) moving at speeds up to Mach 3.5 to the ground stations (GS) that track them with high-power directional antennaæ. This means that the contact time between TAs with closing velocities of Mach 7 may be as low as 10 seconds. Relay nodes (RN) improve multihop performance and location predictability. The telemetry network is connected to the Internet via gateways (GW).

There has been increased interest in the deployment of high-bandwidth point-to-point fixed wireless links as an alternative to fiber optic links due to cost or regulatory concerns. Applications include extending broadband Internet access, backhaul for 3G and proposed 4G deployments, and front-haul umbilical facilities for distributed antenna systems (DAS). Millimeter wave (70-90 GHz) wireless link technology is emerging for very short distances, but has the potential to span several miles and deliver data rates of 1–10 Gb/s. Unfortunately, these frequencies suffer significant attenuation due to atmospheric phenomena such as rain. This project is deploying test links, characterising their performance during real weather events such as thunderstorm, and applying novel routing techniques to a mesh network to mask impairments. We are exploring two new weather disruption-tolerant mesh routing protocols: PWARP predictive weather-assisted routing protocol and XL-OSPF cross-layered OSPF. In both cases, radar imagery is used to predict the trajectories of storm systems. PWARP uses this information to reroute in advance of a predicted disruption due to rain attenuation. XL-OSPF uses radar imagery to estimate the current attenuation on a given link to provide instantaneous reactivity based on cross-layering.