User login

Computer Communication Review: Papers

Happy new year and one more issue of CCR in your inbox or your mailbox. At CCR, we are beginning the new year with a lot of energy and new content that we would like to establish as mainstream in our publication. Starting in January 2015, we are instating a new column in CCR, that is going to be edited by Prof. Aditya Akella, from University of Wisconsin at Madison. Its goal is to provide research and professional advice to our ever growing community. Prof. Akella is describing his intentions with this column in his own editorial. I sincerely hope that this new column will be tremendously successful and will help a lot of the CCR readers navigate academic and research directions or career choices. This issue contains one technical paper that is looking into the tail loss recovery mechanism of TCP and two editorials. The first editorial is an interesting overview of how Internet Exchange Points have evolved in Europe and the U.S.A. The authors provide technical and business related reasons around the observed evolution and outline the issues that would require more attention in the future. Lastly, the second editorial is what I promised in my October 2014 editor’s note. Dr. George Varghese has provided CCR with an editorial note that captures his thinking around what he calls "confluences", and which was presented during his SIGCOMM keynote speech. Reading his editorial literally brought me back to Chicago and the auditorium where George received his SIGCOMM award. I find the concept of "confluence" very important. Finding such confluences is not easy, but when it happens, research becomes fun, exciting, and certainly far easier to motivate and transfer to actual products. I do hope that PhD candidates try to apply George’s framework as they search for their thesis topics. With all this, I wanted to wish you a very happy, and productive 2015. We are always looking forward to your contributions!

Interactive applications do not require more bandwidth to go faster. Instead, they require less latency. Unfortunately, the current design of transport protocols such as TCP limits possible latency reductions. In this paper we evaluate and compare diﬀerent loss recovery enhancements to ﬁght tail loss latency. The two recently proposed mechanisms "RTO Restart" (RTOR) and "Tail Loss Probe" (TLP) as well as a new mechanism that applies the logic of RTOR to the TLP timer management (TLPR) are considered. The results show that the relative performance of RTOR and TLP when tail loss occurs is scenario dependent, but with TLP having potentially larger gains. The TLPR mechanism reaps the beneﬁts of both approaches and in most scenarios it shows the best performance.

The recently launched initiative by the Open-IX Association (OIX) to establish the European-style Internet eXchange Point (IXP) model in the US suggests an intriguing strategy to tackle a problem that some Internet stakeholders in the US consider to be detrimental to their business; i.e., a lack of diversity in available peering opportunities. We examine in this paper the cast of Internet stakeholders that are bound to play a critical role in determining the fate of this Open-IX effort. These include the large content and cloud providers, CDNs, Tier-1 ISPs, the well-established and some of the newer commercial datacenter and colocation companies, and the largest IXPs in Europe. In particular, we comment on these different parties’ current attitudes with respect to public and private peering and discuss some of the economic arguments that will ultimately determine whether or not the currently pursued strategy by OIX will succeed in achieving the main OIX-articulated goal – a more level playing ﬁeld for private and public peering in the US such that the actual demand and supply for the different peering opportunities will be reﬂected in the cost structure.

The most striking ideas in systems are abstractions such as virtual memory, sockets, or packet scheduling. Algorithmics is the servant of abstraction, allowing the performance of the system to approach that of the underlying hardware. I survey the trajectory of network algorithmics, starting with a focus on speed and scale in the 1990s to measurement and security in the 2000s, using what I call the conﬂuence lens. Conﬂuence sees interdisciplinary work as a merger of two or more disciplines made compelling by an inﬂection point in the real world, while also producing genuinely transformed ideas. I attempt to show that Network Algorithmics represented a conﬂuence in the 1990s between computer systems, algorithms, and networking. I suggest Conﬂuence Diagrams as a means to identify future interdisciplinary opportunities, and describe the emerging ﬁeld of Network Veriﬁcation as a new conﬂuence between networking and programming languages.

Dear networking students everywhere, Welcome to the first edition of a new quarterly column aimed at mentoring students in the ACM SIGCOMM community. I’m honored to be the inaugural editor. The primary objective of this column is to offer students advice on general issues pertaining to networking research, teaching, and careers. Most students have excellent advisors, but my hope is that this column will nevertheless help, e.g., by: (a) augmenting advice students get from their advisors, and (b) aiding graduate/undergraduate students who are exploring moving into networking or to a new sub-topic therein. Here are some examples of questions this column is suitable for. This is not an exhaustive list; there are many other relevant issues that are not listed here. • Research methods: “Is there an optimal way to approach networked systems research? What are some things to keep in mind when embarking on active measurement projects?” • Tools, testbeds, datasets: “Where can I do experiments to test my new cool idea for data center networking”? “Are there datasets that will aid me in my research on topic X?” • Time management ahead of conference deadlines: “How do I balance writing vs. hacking/experiments?” • Career advice: “With so much exciting work happening in the industry, is there much point in seeking an academic job? Is a PhD in networking worth it?” • Course choice: “Are there courses I should take to prepare myself for research in topic X? Are there online materials I can use for this?” • Teaching networking: “I’m going to teach my first networking course in the near future. How do I prepare myself? Are there online resources I could use?” • Getting involved: “How do I become a more integral part of the SIGCOMM community?” I will likely not address each individual query; rather my goal is to collate (a subset of) the questions I get, group them into meaningful topics, and offer advice as best as I can. Of course, I am not an expert on many of these topics. Thus, I may seek the opinion of relevant people in the community in responding to specific queries. This column is likely to be just 1-2 pages long. As such, some answers may be brief; e.g., I may simply point folks to online resources that already offer the relevant advice. Note that this is not an advice column of the sort you may find in magazines and Sunday newspapers. I will not address topics that are personal in nature; e.g., if you’re having an issue with your advisor I’m not the one to approach for resolution! I will also not address questions comparing venues (“Is X a better conference than Y?”) or research topics (“Is X a better topic to work on than Y?” or “Is X dead?”). And don’t send me abstracts/papers for feedback! Interested students can send queries to guru@sigcomm.org. Names of students asking questions will not be published by default, unless students request otherwise. If you’re not a student, and you have feedback on or disagreement with a statement or comment in my column, I’d love to hear from you as well. Looking forward to hearing from you all.

Welcome to the October issue of CCR. This issue features the technical and editorial papers that comprise CCR's quarterly content, but also all the papers that appeared at ACM Sigcomm, along with the best papers selected from its affiliated workshops. Sigcomm this year was attended by 733 people, making it the second highest attendance event ACM Sigcomm has ever had, after Hong Kong last year. More interestingly, 23% of all attendees were coming from industry, something which I also noticed during the presentations and the follow up questions. Our conference has become a very vibrant venue for an expanding community. A community, that seems increasingly interested in solving difficult scientific problems while having impact on the technology landscape as a whole. I am really looking forward to the results of such closer collaboration between academic and industrial researchers. The SIGCOMM award was presented to Dr. George Varghese, “for his sustained and diverse contributions to network algorithmics, with far reaching impact in both research and industry.” George’s talk was focused on his framework of structuring his research, such that he indeed solves difficult problems that will change the landscape of technology in a fundamental way. I surely hope to secure an editorial submission from him in a future issue of CCR, where he can describe in his own "written" words his thinking around what he calls "confluence." The main content of the conference revolved around software defined networks, data centers, wireless and cellular networks, security and privacy. Security and trust are also the topics of our three CCR technical papers for this issue. The editorial section comprises the report from the 6th workshop on active Internet measurements, a position paper aiming to define “fog computing,” and the introduction of an interesting open platform for cellular research, OpenAirInterface. I hope you enjoy reading them. Finally, this issue is marking the end of tenure for two of our editors, Dr. Renata Teixeira, and Dr. Sanjay Jha. I wanted to thank both of them for their service to CCR. With that, I hope you enjoy this extended issue of CCR and I am always at your disposal in case of questions/suggestions. Dina Papagiannaki CCR Editor

Instant messaging services are quickly becoming the most dominant form of communication among consumers around the world. Apple iMessage, for example, handles over 2 billion messages each day, while WhatsApp claims 16 billion messages from 400 million international users. To protect user privacy, many of these services typically implement endto-end and transport layer encryption, which are meant to make eavesdropping infeasible even for the service providers themselves. In this paper, however, we show that it is possible for an eavesdropper to learn information about user actions, the language of messages, and even the length of those messages with greater than 96% accuracy despite the use of state-of-the-art encryption technologies simply by observing the sizes of encrypted packets. While our evaluation focuses on Apple iMessage, the attacks are completely generic and we show how they can be applied to many popular messaging services, including WhatsApp, Viber, and Telegram.

In contrast to today’s IP-based host-oriented Internet architecture, Information-Centric Networking (ICN) emphasizes content by making it directly addressable and routable. Named Data Networking (NDN) architecture is an instance of ICN that is being developed as a candidate next-generation Internet architecture. By opportunistically caching content within the network, NDN appears to be well-suited for large-scale content distribution and for meeting the needs of increasingly mobile and bandwidth-hungry applications that dominate today’s Internet. One key feature of NDN is the requirement for each content object to be digitally signed by its producer. Thus, NDN should be, in principle, immune to distributing fake (aka "poisoned") content. However, in practice, this poses two challenges for detecting fake content in NDN routers: (1) overhead due to signature veriﬁcation and certiﬁcate chain traversal, and (2) lack of trust context, i.e., determining which public keys are trusted to verify which content. Because of these issues, NDN does not force routers to verify content signatures, which makes the architecture susceptible to content poisoning attacks. This paper explores root causes of, and some cures for, content poisoning attacks in NDN. In the process, it becomes apparent that meaningful mitigation of content poisoning is contingent upon a network-layer trust management architecture, elements of which we construct, while carefully justifying speciﬁc design choices. This work represents the initial effort towards comprehensive trust management for NDN.

Flow-based approaches for SSH intrusion detection have been developed to overcome the scalability issues of host-based alternatives. Although the detection of many SSH attacks in a ﬂow-based fashion is fairly straightforward, no insight is typically provided in whether an attack was successful. We address this shortcoming by presenting a detection algorithm for the ﬂow-based detection of compromises, i.e., hosts that have been compromised during an attack. Our algorithm has been implemented as part of our open-source IDS SSHCure and validated using almost 100 servers, workstations and honeypots, featuring an accuracy close to 100%.

The cloud is migrating to the edge of the network, where routers themselves may become the virtualisation infrastructure, in an evolution labelled as “the fog”. However, many other complementary technologies are reaching a high level of maturity. Their interplay may dramatically shift the information and communication technology landscape in the following years, bringing separate technologies into a common ground. This paper oﬀers a comprehensive deﬁnition of the fog, comprehending technologies as diverse as cloud, sensor networks, peer-to-peer networks, network virtualisation functions or conﬁguration management techniques. We highlight the main challenges faced by this potentially breakthrough technology amalgamation.

Driven by the need to cope with exponentially growing mobile data trafﬁc and to support new trafﬁc types from massive numbers of machine-type devices, academia and industry are thinking beyond the current generation of mobile cellular networks to chalk a path towards ﬁfth generation (5G) mobile networks. Several new approaches and technologies are being considered as potential elements making up such a future mobile network, including cloud RANs, application of SDN principles, exploiting new and unused portions of spectrum, use of massive MIMO and full-duplex communications. Research on these technologies requires realistic and ﬂexible experimentation platforms that offer a wide range of experimentation modes from real-world experimentation to controlled and scalable evaluations while at the same time retaining backward compatibility with current generation systems. Towards this end, we present OpenAirInterface (OAI) as a suitably ﬂexible platform. In addition, we discuss the use of OAI in the context of several widely mentioned 5G research directions.

On 26-27 March 2014, CAIDA hosted the sixth Workshop on Active Internet Measurements (AIMS-6) as part of our series of Internet Statistics and Metrics Analysis (ISMA) workshops. As with previous AIMS workshops, the goals were to further our understanding of the potential and limitations of active measurement research and infrastructure in the wide-area Internet, and to promote cooperative solutions and coordinated strategies between academics, industry, policymakers, and funding agencies in the area of active Internet measurement. This year, we explored capabilities and opportunities for network measurement in the wireless domain, and research infrastructure to support it. Participants found the workshop content challengingly diverse, with substantial knowledge exchange regarding the wireless research infrastructure landscape(s) and existing measurement capabilities. But attendees agreed that the conversation was only beginning, and that some challenges merit further discussion, such as ﬁnding consensus on standard metrics to measure, and constructing a road map for wireless measurement research infrastructure and activities for the next decade. This report describes topics discussed at the workshop, and summarizes participants’ views of priorities for future funding as well as follow-on workshops in this area. Materials related to the workshop are available at http://www.caida.org/workshops/aims/1403/.

Software-deﬁned networks can enable a variety of concurrent, dynamically instantiated, measurement tasks, that provide ﬁne-grain visibility into network trafﬁc. Recently, there have been many proposals to conﬁgure TCAM counters in hardware switches to monitor trafﬁc. However, the TCAM memory at switches is fundamentally limited and the accuracy of the measurement tasks is a function of the resources devoted to them on each switch. This paper describes an adaptive measurement framework, called DREAM, that dynamically adjusts the resources devoted to each measurement task, while ensuring a user-speciﬁed level of accuracy. Since the trade-off between resource usage and accuracy can depend upon the type of tasks, their parameters, and trafﬁc characteristics, DREAM does not assume an a priori characterization of this trade-off, but instead dynamically searches for a resource allocation that is sufﬁcient to achieve a desired level of accuracy. A prototype implementation and simulations with three network-wide measurement tasks (heavy hitter, hierarchical heavy hitter and change detection) and diverse trafﬁc show that DREAM can support more concurrent tasks with higher accuracy than several other alternatives.

This paper presents a practical approach to rapidly introducing new dataplane functionality into networks: End-hosts embed tiny programs into packets to actively query and manipulate a network’s internal state. We show how this “tiny packet program” (TPP) interface gives end-hosts unprecedented visibility into network behavior, enabling them to work with the network to achieve a desired functionality. Our design leverages what each component does best: (a) switches forward and execute tiny packet programs (at most 5 instructions) in-band at line rate, and (b) end-hosts perform arbitrary (and easily updated) computation on network state. By implementing three different research proposals, we show that TPPs are useful. Using a hardware prototype on a NetFPGA, we show our design is feasible at a reasonable cost.

The Resource Public Key Infrastructure (RPKI) is a new infrastructure that prevents some of the most devastating attacks on interdomain routing. However, the security beneﬁts provided by the RPKI are accomplished via an architecture that empowers centralized authorities to unilaterally revoke any IP preﬁxes under their control. We propose mechanisms to improve the transparency of the RPKI, in order to mitigate the risk that it will be used for IP address takedowns. First, we present tools that detect and visualize changes to the RPKI that can potentially take down an IP preﬁx. We use our tools to identify errors and revocations in the production RPKI. Next, we propose modiﬁcations to the RPKI’s architecture to (1) require any revocation of IP address space to receive consent from all impacted parties, and (2) detect when misbehaving authorities fail to obtain consent. We present a security analysis of our architecture, and estimate its overhead using data-driven analysis.

Efﬁcient packet classiﬁcation is a core concern for network services. Traditional multi-ﬁeld classiﬁcation approaches, in both software and ternary content-addressable memory (TCAMs), entail tradeoffs between (memory) space and (lookup) time. TCAMs cannot efﬁciently represent range rules, a common class of classiﬁcation rules conﬁning values of packet ﬁelds to given ranges. The exponential space growth of TCAM entries relative to the number of ﬁelds is exacerbated when multiple ﬁelds contain ranges. In this work, we present a novel approach which identiﬁes properties of many classiﬁers which can be implemented in linear space and with worst-case guaranteed logarithmic time and allows the addition of more ﬁelds including range constraints without impacting space and time complexities. On real-life classiﬁers from Cisco Systems and additional classiﬁers from ClassBench [7] (with real parameters), 90-95% of rules are thus handled, and the other 510% of rules can be stored in TCAM to be processed in parallel.

After several IPv4 address exhaustion milestones in the last three years, it is becoming apparent that the world is running out of IPv4 addresses, and the adoption of the next generation Internet protocol, IPv6, though nascent, is accelerating. In order to better understand this unique and disruptive transition, we explore twelve metrics using ten global-scale datasets to create the longest and broadest measurement of IPv6 adoption to date. Using this perspective, we ﬁnd that adoption, relative to IPv4, varies by two orders of magnitude depending on the measure examined and that care must be taken when evaluating adoption metrics in isolation. Further, we ﬁnd that regional adoption is not uniform. Finally, and perhaps most surprisingly, we ﬁnd that over the last three years, the nature of IPv6 utilization—in terms of trafﬁc, content, reliance on transition technology, and performance—has shifted dramatically from prior ﬁndings, indicating a maturing of the protocol into production mode. We believe IPv6’s recent growth and this changing utilization signal a true quantum leap.

Existing ABR algorithms face a signiﬁcant challenge in estimating future capacity: capacity can vary widely over time, a phenomenon commonly observed in commercial services. In this work, we suggest an alternative approach: rather than presuming that capacity estimation is required, it is perhaps better to begin by using only the buffer, and then ask when capacity estimation is needed. We test the viability of this approach through a series of experiments spanning millions of real users in a commercial service. We start with a simple design which directly chooses the video rate based on the current buffer occupancy. Our own investigation reveals that capacity estimation is unnecessary in steady state; however using simple capacity estimation (based on immediate past throughput) is important during the startup phase, when the buffer itself is growing from empty. This approach allows us to reduce the rebuffer rate by 10–20% compared to Netﬂix’s then-default ABR algorithm, while delivering a similar average video rate, and a higher video rate in steady state.

The Forwarding Information Base (FIB) of backbone routers has been rapidly growing in size. An ideal IP lookup algorithm should achieve constant, yet small, IP lookup time and on-chip memory usage. However, no prior IP lookup algorithm achieves both requirements at the same time. In this paper, we ﬁrst propose SAIL, a Splitting Approach to IP Lookup. One splitting is along the dimension of the lookup process, namely ﬁnding the preﬁx length and ﬁnding the next hop, and another splitting is along the dimension of preﬁx length, namely IP lookup on preﬁxes of length less than or equal to 24 and IP lookup on preﬁxes of length longer than 24. Second, we propose a suite of algorithms for IP lookup based on our SAIL framework. Third, we implemented our algorithms on four platforms: CPU, FPGA, GPU, and many-core. We conducted extensive experiments to evaluate our algorithms using real FIBs and real traﬃc from a major ISP in China. Experimental results show that our SAIL algorithms are several times or even two orders of magnitude faster than well known IP lookup algorithms.

We present Statesman, a network-state management service that allows multiple network management applications to operate independently, while maintaining network-wide safety and performance invariants. Network state captures various aspects of the network such as which links are alive and how switches are forwarding trafﬁc. Statesman uses three views of the network state. In observed state, it maintains an up-to-date view of the actual network state. Applications read this state and propose state changes based on their individual goals. Using a model of dependencies among state variables, Statesman merges these proposed states into a target state that is guaranteed to maintain the safety and performance invariants. It then updates the network to the target state. Statesman has been deployed in ten Microsoft Azure datacenters for several months, and three distinct applications have been built on it. We use the experience from this deployment to demonstrate how Statesman enables each application to meet its goals, while maintaining network-wide invariants.

This paper describes the design and implementation of HERD, a keyvalue system designed to make the best use of an RDMA network. Unlike prior RDMA-based key-value systems, HERD focuses its design on reducing network round trips while using efﬁcient RDMA primitives; the result is substantially lower latency, and throughput that saturates modern, commodity RDMA hardware. HERD has two unconventional decisions: First, it does not use RDMA reads, despite the allure of operations that bypass the remote CPU entirely. Second, it uses a mix of RDMA and messaging verbs, despite the conventional wisdom that the messaging primitives are slow. A HERD client writes its request into the server’s memory; the server computes the reply. This design uses a single round trip for all requests and supports up to 26 million key-value operations per second with 5 µs average latency. Notably, for small key-value items, our full system throughput is similar to native RDMA read throughput and is over 2X higher than recent RDMA-based keyvalue systems. We believe that HERD further serves as an effective template for the construction of RDMA-based datacenter services.

BGP severely constrains how networks can deliver trafﬁc over the Internet. Today’s networks can only forward trafﬁc based on the destination IP preﬁx, by selecting among routes offered by their immediate neighbors. We believe Software Deﬁned Networking (SDN) could revolutionize wide-area trafﬁc delivery, by offering direct control over packet-processing rules that match on multiple header ﬁelds and perform a variety of actions. Internet exchange points (IXPs) are a compelling place to start, given their central role in interconnecting many networks and their growing importance in bringing popular content closer to end users. To realize a Software Deﬁned IXP (an “SDX”), we must create compelling applications, such as “application-speciﬁc peering”— where two networks peer only for (say) streaming video trafﬁc. We also need new programming abstractions that allow participating networks to create and run these applications and a runtime that both behaves correctly when interacting with BGP and ensures that applications do not interfere with each other. Finally, we must ensure that the system scales, both in rule-table size and computational overhead. In this paper, we tackle these challenges and demonstrate the ﬂexibility and scalability of our solutions through controlled and in-the-wild experiments. Our experiments demonstrate that our SDX implementation can implement representative policies for hundreds of participants who advertise full routing tables while achieving sub-second convergence in response to conﬁguration changes and routing updates.

This paper presents the design and implementation of Geosphere, a physical- and link-layer design for access point-based MIMO wireless networks that consistently improves network throughput. To send multiple streams of data in a MIMO system, prior designs rely on a technique called zero-forcing, a way of “nulling” the interference between data streams by mathematically inverting the wireless channel matrix. In general, zero-forcing is highly effective, signiﬁcantly improving throughput. But in certain physical situations, the MIMO channel matrix can become “poorly conditioned,” harming performance. With these situations in mind, Geosphere uses sphere decoding, a more computationally demanding technique that can achieve higher throughput in such channels. To overcome the sphere decoder’s computational complexity when sending dense wireless constellations at a high rate, Geosphere introduces search and pruning techniques that incorporate novel geometric reasoning about the wireless constellation. These techniques reduce computational complexity of 256-QAM systems by almost one order of magnitude, bringing computational demands in line with current 16- and 64-QAM systems already realized in ASIC. Geosphere thus makes the sphere decoder practical for the ﬁrst time in a 4 × 4 MIMO, 256-QAM system. Results from our WARP testbed show that Geosphere achieves throughput gains over multi-user MIMO of 2× in 4 × 4 systems and 47% in 2 × 2 MIMO systems.

Control-plane protocols are complex in cellular networks. They communicate with one another along three dimensions of cross layers, cross (circuit-switched and packet-switched) domains, and cross (3G and 4G) systems. In this work, we propose signaling diagnosis tools and uncover six instances of problematic interactions. Such control-plane issues span both design defects in the 3GPP standards and operational slips by carriers. They are more damaging than data-plane failures. In the worst-case scenario, users may be out of service in 4G, or get stuck in 3G. We deduce root causes, propose solutions, and summarize learned lessons.

Software bugs are inevitable in software-deﬁned networking control software, and troubleshooting is a tedious, time-consuming task. In this paper we discuss how to improve control software troubleshooting by presenting a technique for automatically identifying a minimal sequence of inputs responsible for triggering a given bug, without making assumptions about the language or instrumentation of the software under test. We apply our technique to ﬁve open source SDN control platforms—Floodlight, NOX, POX, Pyretic, ONOS—and illustrate how the minimal causal sequences our system found aided the troubleshooting process.

Many data center transports have been proposed in recent times (e.g., DCTCP, PDQ, pFabric, etc). Contrary to the common perception that they are competitors (i.e., protocol A vs. protocol B), we claim that the underlying strategies used in these protocols are, in fact, complementary. Based on this insight, we design PASE, a transport framework that synthesizes existing transport strategies, namely, self-adjusting endpoints (used in TCP style protocols), innetwork prioritization (used in pFabric), and arbitration (used in PDQ). PASE is deployment friendly: it does not require any changes to the network fabric; yet, its performance is comparable to, or better than, the state-of-the-art protocols that require changes to network elements (e.g., pFabric). We evaluate PASE using simulations and testbed experiments. Our results show that PASE performs well for a wide range of application workloads and network settings.

Though most would agree that accountability and privacy are both valuable, today’s Internet provides little support for either. Previous eﬀorts have explored ways to oﬀer stronger guarantees for one of the two, typically at the expense of the other; indeed, at ﬁrst glance accountability and privacy appear mutually exclusive. At the center of the tussle is the source address: in an accountable Internet, source addresses undeniably link packets and senders so hosts can be punished for bad behavior. In a privacy-preserving Internet, source addresses are hidden as much as possible. In this paper, we argue that a balance is possible. We introduce the Accountable and Private Internet Protocol (APIP), which splits source addresses into two separate ﬁelds — an accountability address and a return address — and introduces independent mechanisms for managing each. Accountability addresses, rather than pointing to hosts, point to accountability delegates, which agree to vouch for packets on their clients’ behalves, taking appropriate action when misbehavior is reported. With accountability handled by delegates, senders are now free to mask their return addresses; we discuss a few techniques for doing so.

We present Dionysus, a system for fast, consistent network updates in software-deﬁned networks. Dionysus encodes as a graph the consistency-related dependencies among updates at individual switches, and it then dynamically schedules these updates based on runtime differences in the update speeds of different switches. This dynamic scheduling is the key to its speed; prior update methods are slow because they pre-determine a schedule, which does not adapt to runtime conditions. Testbed experiments and data-driven simulations show that Dionysus improves the median update speed by 53–88% in both wide area and data center networks compared to prior methods.

When can we reason about the neutrality of a network based on external observations? We prove conditions under which it is possible to (a) detect neutrality violations and (b) localize them to speciﬁc links, based on external observations. Our insight is that, when we make external observations from different vantage points, these will most likely be inconsistent with each other if the network is not neutral. Where existing tomographic techniques try to form solvable systems of equations to infer network properties, we try to form unsolvable systems that reveal neutrality violations. We present an algorithm that relies on this idea to identify sets of nonneutral links based on external observations, and we show, through network emulation, that it achieves good accuracy for a variety of network conditions.

An ideal datacenter network should provide several properties, including low median and tail latency, high utilization (throughput), fair allocation of network resources between users or applications, deadline-aware scheduling, and congestion (loss) avoidance. Current datacenter networks inherit the principles that went into the design of the Internet, where packet transmission and path selection decisions are distributed among the endpoints and routers. Instead, we propose that each sender should delegate control—to a centralized arbiter—of when each packet should be transmitted and what path it should follow. This paper describes Fastpass, a datacenter network architecture built using this principle. Fastpass incorporates two fast algorithms: the ﬁrst determines the time at which each packet should be transmitted, while the second determines the path to use for that packet. In addition, Fastpass uses an efﬁcient protocol between the endpoints and the arbiter and an arbiter replication strategy for fault-tolerant failover. We deployed and evaluated Fastpass in a portion of Facebook’s datacenter network. Our results show that Fastpass achieves high throughput comparable to current networks at a 240× reduction is queue lengths (4.35 Mbytes reducing to 18 Kbytes), achieves much fairer and consistent ﬂow throughputs than the baseline TCP (5200× reduction in the standard deviation of per-ﬂow throughput with ﬁve concurrent connections), scalability from 1 to 8 cores in the arbiter implementation with the ability to schedule 2.21 Terabits/s of trafﬁc in software on eight cores, and a 2.5× reduction in the number of TCP retransmissions in a latency-sensitive service at Facebook.

Software-deﬁned networking introduces the possibility of building self-tuning networks that constantly monitor network conditions and react rapidly to important events such as congestion. Unfortunately, state-of-the-art monitoring mechanisms for conventional networks require hundreds of milliseconds to seconds to extract global network state, like link utilization or the identity of “elephant” ﬂows. Such latencies are adequate for responding to persistent issues, e.g., link failures or long-lasting congestion, but are inadequate for responding to transient problems, e.g., congestion induced by bursty workloads sharing a link. In this paper, we present Planck, a novel network measurement architecture that employs oversubscribed port mirroring to extract network information at 280 µs–7 ms timescales on a 1 Gbps commodity switch and 275 µs–4 ms timescales on a 10 Gbps commodity switch, over 11x and 18x faster than recent approaches, respectively (and up to 291x if switch ﬁrmware allowed buffering to be disabled on some ports). To demonstrate the value of Planck’s speed and accuracy, we use it to drive a trafﬁc engineering application that can reroute congested ﬂows in milliseconds. On a 10 Gbps commodity switch, Planck-driven trafﬁc engineering achieves aggregate throughput within 1–4% of optimal for most workloads we evaluated, even with ﬂows as small as 50 MiB, an improvement of up to 53% over previous schemes.

Communication primitives such as coding and multiple antenna processing have provided signiﬁcant beneﬁts for traditional wireless systems. Existing designs, however, consume signiﬁcant power and computational resources, and hence cannot be run on low complexity, power constrained backscatter devices. This paper makes two main contributions: (1) we introduce the ﬁrst multi-antenna cancellation design that operates on backscatter devices while retaining a small form factor and power footprint, (2) we introduce a novel coding mechanism that enables long range communication as well as concurrent transmissions and can be decoded on backscatter devices. We build hardware prototypes of the above designs that can be powered solely using harvested energy from TV and solar sources. The results show that our designs provide beneﬁts for both RFID and ambient backscatter systems: they enable RFID tags to communicate directly with each other at distances of tens of meters and through multiple walls. They also increase the communication rate and range achieved by ambient backscatter systems by 100X and 40X respectively. We believe that this paper represents a substantial leap in the capabilities of backscatter communication.

Network functions virtualization (NFV) together with softwaredeﬁned networking (SDN) has the potential to help operators satisfy tight service level agreements, accurately monitor and manipulate network trafﬁc, and minimize operating expenses. However, in scenarios that require packet processing to be redistributed across a collection of network function (NF) instances, simultaneously achieving all three goals requires a framework that provides efﬁcient, coordinated control of both internal NF state and network forwarding state. To this end, we design a control plane called OpenNF. We use carefully designed APIs and a clever combination of events and forwarding updates to address race conditions, bound overhead, and accommodate a variety of NFs. Our evaluation shows that OpenNF offers efﬁcient state control without compromising ﬂexibility, and requires modest additions to NFs.

Network faults such as link failures and high switch conﬁguration delays can cause heavy congestion and packet loss. Because it takes time for the trafﬁc engineering systems to detect and react to such faults, these conditions can last long—even tens of seconds. We propose forward fault correction (FFC), a proactive approach for handling faults. FFC spreads network trafﬁc such that freedom from congestion is guaranteed under arbitrary combinations of up to k faults. We show how FFC can be practically realized by compactly encoding the constraints that arise from this large number of possible faults and solving them efﬁciently using sorting networks. Experiments with data from real networks show that, with negligible loss in overall network throughput, FFC can reduce data loss by a factor of 7–130 in well-provisioned networks, and reduce the loss of high-priority trafﬁc to almost zero in well-utilized networks.

Communication in data-parallel applications often involves a collection of parallel ﬂows. Traditional techniques to optimize ﬂowlevel metrics do not perform well in optimizing such collections, because the network is largely agnostic to application-level requirements. The recently proposed coﬂow abstraction bridges this gap and creates new opportunities for network scheduling. In this paper, we address inter-coﬂow scheduling for two different objectives: decreasing communication time of data-intensive jobs and guaranteeing predictable communication time. We introduce the concurrent open shop scheduling with coupled resources problem, analyze its complexity, and propose effective heuristics to optimize either objective. We present Varys, a system that enables data-intensive frameworks to use coﬂows and the proposed algorithms while maintaining high network utilization and guaranteeing starvation freedom. EC2 deployments and trace-driven simulations show that communication stages complete up to 3.16× faster on average and up to 2× more coﬂows meet their deadlines using Varys in comparison to per-ﬂow mechanisms. Moreover, Varys outperforms non-preemptive coﬂow schedulers by more than 5×.

We present the design, implementation, and evaluation of CONGA, a network-based distributed congestion-aware load balancing mechanism for datacenters. CONGA exploits recent trends including the use of regular Clos topologies and overlays for network virtualization. It splits TCP flows into flowlets, estimates real-time congestion on fabric paths, and allocates flowlets to paths based on feedback from remote switches. This enables CONGA to efficiently balance load and seamlessly handle asymmetry, without requiring any TCP modiﬁcations. CONGA has been implemented in custom ASICs as part of a new datacenter fabric. In testbed experiments, CONGA has 5× better ﬂow completion times than ECMP even with a single link failure and achieves 2–8× better throughput than MPTCP in Incast scenarios. Further, the Price of Anarchy for CONGA is provably small in Leaf-Spine topologies; hence CONGA is nearly as effective as a centralized scheduler while being able to react to congestion in microseconds. Our main thesis is that datacenter fabric load balancing is best done in the network, and requires global schemes such as CONGA to handle asymmetry.

Load balancing is a foundational function of datacenter infrastructures and is critical to the performance of online services hosted in datacenters. As the demand for cloud services grows, expensive and hard-to-scale dedicated hardware load balancers are being replaced with software load balancers that scale using a distributed data plane that runs on commodity servers. Software load balancers offer low cost, high availability and high ﬂexibility, but suffer high latency and low capacity per load balancer, making them less than ideal for applications that demand either high throughput, or low latency or both. In this paper, we present D UET, which offers all the beneﬁts of software load balancer, along with low latency and high availability – at next to no cost. We do this by exploiting a hitherto overlooked resource in the data center networks – the switches themselves. We show how to embed the load balancing functionality into existing hardware switches, thereby achieving organic scalability at no extra cost. For ﬂexibility and high availability, D UET seamlessly integrates the switch-based load balancer with a small deployment of software load balancer. We enumerate and solve several architectural and algorithmic challenges involved in building such a hybrid load balancer. We evaluate D UET using a prototype implementation, as well as extensive simulations driven by traces from our production data centers. Our evaluation shows that D UET provides 10x more capacity than a software load balancer, at a fraction of a cost, while reducing latency by a factor of 10 or more, and is able to quickly adapt to network dynamics including failures.

A longstanding problem with the Internet is that it is vulnerable to outages, black holes, hijacking and denial of service. Although architectural solutions have been proposed to address many of these issues, they have had difﬁculty being adopted due to the need for widespread adoption before most users would see any beneﬁt. This is especially relevant as the Internet is increasingly used for applications where correct and continuous operation is essential. In this paper, we study whether a simple, easy to implement model is sufﬁcient for addressing the aforementioned Internet vulnerabilities. Our model, called ARROW (Advertised Reliable Routing Over Waypoints), is designed to allow users to conﬁgure reliable and secure end to end paths through participating providers. With ARROW, a highly reliable ISP offers tunneled transit through its network, along with packet transformation at the ingress, as a service to remote paying customers. Those customers can stitch together reliable end to end paths through a combination of participating and non-participating ISPs in order to improve the faulttolerance, robustness, and security of mission critical transmissions. Unlike efforts to redesign the Internet from scratch, we show that ARROW can address a set of well-known Internet vulnerabilities, for most users, with the adoption of only a single transit ISP. To demonstrate ARROW, we have added it to a small-scale wide-area ISP we control. We evaluate its performance and failure recovery properties in both simulation and live settings.

RF-powered computers are small devices that compute and communicate using only the power that they harvest from RF signals. While existing technologies have harvested power from ambient RF sources (e.g., TV broadcasts), they require a dedicated gateway (like an RFID reader) for Internet connectivity. We present Wi-Fi Backscatter, a novel communication system that bridges RF-powered devices with the Internet. Speciﬁcally, we show that it is possible to reuse existing Wi-Fi infrastructure to provide Internet connectivity to RF-powered devices. To show Wi-Fi Backscatter’s feasibility, we build a hardware prototype and demonstrate the ﬁrst communication link between an RF-powered device and commodity Wi-Fi devices. We use off-the-shelf Wi-Fi devices including Intel Wi-Fi cards, Linksys Routers, and our organization’s Wi-Fi infrastructure, and achieve communication rates of up to 1 kbps and ranges of up to 2.1 meters. We believe that this new capability can pave the way for the rapid deployment and adoption of RF-powered devices and achieve ubiquitous connectivity via nearby mobile devices that are Wi-Fi enabled.