New abstraction for edge network systems

The end-to-end principle of the internet is a fallacy. Modern distributed system rely on the cloud rather than deal with the complexity of the edge network. We propose to explore how to provide primitives such as consistency, integrity, accessibility and authentication in the context of edge network distributed systems.

Motivation

The internet has abandoned the end-to-end principles on which it was established [2]. With IPv4 addresses depleted and the transition to IPv6 yet to restore public identities, devices are left behind NATs and firewalls. Instead of dealing with the complexity of the edge network, users opt to use centralized cloud services, offering usability and high availability.

It’s a jungle on the edge network. By Original – Highways Agency [CC BY 2.0], via Wikimedia CommonsIn this post-Snowden era, users are beginning to question their decision out of fear of censorship and mass-surveillance. Furthermore, a series of highly publicized data breaches and DoS attacks have shed light on the weak guarantees provided by opaque terms of service [8], which are engineered to minimize legal responsibility. Classes of applications such as multiplayer games and video conferencing can benefit from low latency characteristics of direct peer to peer connections whilst others such as local file sync and sharing can benefit from the high bandwidth and scalability. Even in this mod- ern world, users need the ability to establish inter-device connectivity without a full internet connection, for example isolating processing of personal data from the Internet Of Things or connecting between personal devices on the go.

In response to this demand, developers are building new applications for the edge network. They are reimplementing solutions to establishing authenticated identities, consensus and availability in the face of mobile nodes, network partitions and asymmetric channels. Without a clear stack and layers of abstraction, systems fail to provide even the most basic safety guarantees. Protocols are layered on top of each other without formal agreement on the services provided at each layer. Even after this engineering effort by developers, systems still require intricate configuration to deal with the diversity of devices, middleboxes and network environments [6] on the edge network, if they are able to work at all.

Challenges

For this discussion we make the following distinction. Data requirements are needs specified by the application, for example a distributed file system may specify that file meta- data must be strongly consistent whilst the files themselves need only be eventually consistent. In contrast, environmental requirements is the set of network environments that the application needs to operate in. For example, an application might specify that the nodes may be mobile and intermittently connected, however there will always be a cloud node which is highly reliable and publicly addressable but run on untrusted infrastructure.

Our focus on the edge network means we lose the data center assumptions, typical in distributed systems for decades. The environmental requirements now spans:

Heterogeneous network topologies — Middleboxes plague the edge network, network topologies are complex, devices may have asymmetric reachability, there is a wide range of link characteristics and traffic can be treated differently depending on its class.

Mobile nodes — We can no longer rely on IP ad- dresses to identify nodes. Nodes may move between networks and have multiple network interfaces. Inter- mittent connectivity and network partitions are com- mon.

Diverse hardware — Devices can vary in the con- straints of CPU, power supply or memory. Utilizing different networks may come at different costs.

New failure models — We no longer assume homogeneous trust between nodes. Different nodes suffer with different failure models and expected failure patterns.

Developers make crude assumptions about their applications’ requirements. The data requirement space is large, it includes some regions that have been proved impossible and others which may prove impossible.

Research Questions

The key research question is how can we provide services such as consistency, accessibility and authentication in the context of edge network distributed systems, this encompasses other questions such as:

Which areas of the space of data and environmental requirements are covered by existing distributed algorithms, which areas are not yet covered and which areas are provably impossible to cover?

How can we formally express the assumptions and guarantees of distributed algorithms and their trade-offs, data and environmental requirements such that our engine can resolve them?

How can we evaluate such systems given the diversity of possible environmental requirements and combina- tions of data requirements?

How can we ensure that the distributed algorithms provide the stated guarantees under the assumptions? How can we construct and reason about these algorithms such that they provide stronger guarantees then conventional systems?

How can we combine the above to provide a stack of protocols which fulfils the data requirements, given the environmental requirements?

Approach

We propose a new common abstraction between applications and networked devices to form personal clouds. Programmers (and ultimately users) formally specify the data and environmental requirements, these requirements span domains in fault tolerance, replication, consistency, caching, accessibility, security levels and confidentiality. From a col- lection of distributed algorithms, each with their own set of formally specified assumptions and guarantees, an engine will stack the protocols to provide the data requirements in the environmental requirements. From this foundation, we can build new distributed systems including new systems for personal data [4]. We are currently considering building upon a suite of existing tools in this domain including a unikernel operating system [10], TLS implementation [11], a git-style distributed data store [3] and Raft consensus implementation [5].

State of the Art

Sapphire [13] is a programming platform to separate application and deployment logic in cloud and mobile applications. Whilst Sapphire’s motivation is similar to ours, it covers a limited space of data requirements and environmental requirementss and doesn’t provide any guarantees to applications running on the platform.

The systems community is beginning to design distributed protocols specifically to tolerate the edge network, such as achieving consistency in an environment of heterogeneous trust [12, 7]. But quantifying the environmental requirements of such protocols requires a much richer abstraction than those currently used. Some authors [1, 9] suggest we can provide stronger guarantees for distributed protocols by changing the basic programming constructs and languages used, this is something we intend to explore further.