Archive for the 'Uncategorized' Category

The great thing about working with great students is following their successes throughout their career.

I just learned that Haifeng Yu was promoted to Associate Professor with tenure at National University of Singapore (NUS). Haifeng was my first graduating PhD student and now the first to receive tenure. Overall, Haifeng has always been drawn to deep, challenging problems and his work is always insightful. He did his graduate work on consistency models for replicated and distributed systems. More recently, he has been working on network security, system availability, and most recently coding for wireless systems. His recent SIGCOMM 2010 paper on flexible coding schemes will be (in my biased opinion) of great interest. In fact, it won the best paper award at the conference.

The amount of interest in data centers and data center networking continues to grow. For the past decade plus, the most savvy Internet companies have been focusing on infrastructure. Essentially, planetary scale services such as search, social networking, and e-commerce require a tremendous amount of computation and storage. When operating at the scale of tens of thousands of computers and petabytes of storage, small gains in efficiency can result in millions of dollars of annual savings. On the other extreme, efficient access to tremendous amounts of computation can enable companies to deliver more valuable content. For example, Amazon is famous for tailoring web page contents to individual customers based on both their history and potentially the history of similar users. Doing so while maintaining interactive response times (typically responding in less than 300 ms) requires fast, parallel access to data potentially spread across hundreds or even thousands of computers. In an earlier post, I described the Facebook architecture and its reliance on clustering for delivering social networking content.

Over the last few years, academia has become increasingly interested in data centers and cloud computing. One reason is the opportunity for impact; it is clear, that the entire computing industry is undergoing another paradigm shift. Five years from now, it is clear that the way we build out computing and storage infrastructures will be radically different. Another allure of the data center is the fact that it is possible to do “clean slate” research and deployment. One frustration of the networking research community has been the inability to deploy novel architectures and protocols because of the need to be backward compatible and friendly to legacy systems. Check out this paper for an excellent discussion. In the data center, it is at least possible to deploy entirely new architectures without the need to be compatible with every protocol developed over the years.

Of course, there are difficulties with performing data center research as well. One is having access to the necessary infrastructure to perform research at scale. With companies deploying data centers at the scale of tens of thousands of computers, it is difficult for most universities and even research labs to have access to the necessary infrastructure. In our own experience, we have found that it is possible to consider problems of scale even with a relatively modest number of machines. Research infrastructures such Emulab and OpenCirrus are open compute platforms that provide significant amount of computing infrastructure to the research community.

Another challenge is the lack of software infrastructure for performing data center research, particularly in networking. Eucalyptus provides an EC2-compatible environment for cloud computing. However, there is a relative void of available research software for research in networking. Rebuilding every aspect of the protocol stack before performing research in fundamental algorithms and protocols is a challenge.

To partially address this shortcoming, we are release an alpha version of our PortLand protocol. This work was published in SIGCOMM 2009 and targets delivering a unified Layer 2 environment for easier management and support for basic functionality such as virtual machine migration. I discussed our work on PortLand in an earlier post here and some of the issues of Layer 2 versus Layer 3 deployment here.

The page for downloading PortLand is now up. It reflects the hard work of two graduate students in my group, Sambit Das and Malveeka Tewari, who took our research code and ported it HP ProCurve switches running OpenFlow. The same codebase runs on NetFPGA switches as well. We hope the community can confirm that the same code runs on a variety of other OpenFlow-enabled switches. Our goal is for PortLand to be a piece of the puzzle for a software environment for performing research in data center networking. We encourage you to try it out and give us feedback. In the meantime, Sambit and Malveeka are hard at work in adding Hedera functionality for flow scheduling for our next code release.

I just read an interesting article “Has Amazon EC2 Become Oversubscribed?” The article describes one company’s relatively large-scale usage of EC2 over a three-year period. Apparently, over the first 18 months, EC2 small instances provided sufficient performance for their largely I/O bound network servers. Recently however, they have increasingly run into problems with “noisy neighbors”. They speculate that they happen to be co-located with other virtual machines that are using so much CPU that the small instances are unable to get their fair share of the CPU. They recently moved to large instances to avoid the noisy neighbor problem.

More recently however, they have been finding unacceptable local network performance even on large instances, with ping times ranging from hundreds of milliseconds to even seconds in some cases. This increase in ping time is likely due to overload on the end hosts, with the hypervisor unable to keep up with even the network load imposed by pings. (The issue is highly unlikely to be in the network because switches deployed in the data center do not have that kind of buffering.)

The conclusion from the article, along with the associated comments, is that Amazon has not sufficiently provisioned EC2 and that is the cause of the overload.

While this is purely speculation on my part, I believe that underprovisioning of the cloud compute infrastructure is unlikely to be the sole cause of the problem. Amazon has very explicit descriptions of the amount of computing power associated with each type of computing instance. And it is fairly straightforward to set the hypervisor to allocate a fixed share of the CPU to individual virtual machine instances. I am assuming that Amazon has set the CPU schedulers to appropriately reserve the appropriate portion of each machine’s physical CPU (and memory) to each VM.

For CPU-bound VM’s, the hypervisor scheduler is quite efficient at allocating resources according to administrator-specified levels. However, the achilles heel of scheduling in VM environments is I/O. More particularly, the hypervisor typically has no way to account for the work performed on behalf of individual VMs in either the hypervisor or (likely the bigger culprit) in driver domains responsible for things like network I/O or disk I/O. Hence, if a particular instance performs significant network communication (either externally or to other EC2 hosts), the corresponding system calls will first go into the local kernel. The kernel likely has a virtual device driver for either the disk or the NIC. However, for protection, the virtual device driver cannot have access to the actual physical device. Hence, the kernel driver must transfer control to the hypervisor, which in turn likely transfers control to a device driver likely running a separate domain.

The work done in the driver domain on behalf of a particular is difficult to account for. In fact, this work is typically not “billed” back to the original domain. So, a virtual machine can effectively mount a denial of service attack (whether malicious or not) on other co-located VM’s simply by performing significant I/O. With colleagues at HP Labs, we wrote a couple of papers investigating this exact issue a few years back:

As mentioned above however, without having access to the actual workloads on EC2, it is impossible to know whether the hypervisor scheduler is really the culprit. It will be interesting to see whether Amazon has a response.

Harsha Madhyastha‘s paper “Moving Beyond End-to-End Path Information to Optimize CDN Performance” won the best paper award at IMC 2009. The paper presents measurements information from Google’s production CDN to show that redirecting clients to the nearest CDN node will not necessarily result in the lowest latency. Harsha and his colleagues built a tool called WhyHigh in production use at Google that uses a series of active measurements to diagnose the cause of inflated latency to the relatively large number of clients that experience poor latency to individual CDN nodes. Definitely a worthwhile read.

Now for a bit of a diversion from the usual topics I have been writing about lately. I have always been a big fan of a good visualization, and I recently ran across an excellent one here, depicting some of the inputs and outputs that go into left-leaning versus right-leaning political thinkers. Of course, it is not perfect and of course it makes some simplifications. But there are two things I like about it:

It draws some very interesting basis for different ways of thinking, valuing freedom over equality or vice versa for example.

Either “side” looking at the visualization would likely think “I always knew the other side was fundamentally flawed and that I was right all along.” So it fairly (overall) represents different viewpoints without passing judgement.

This depiction made me think of Minard’s classic 1869 visualization of Naploeon’s land campaign through Russia during 1812-1813.

Once, The Economist starts writing about a technology topic, you know that it has hit the mainstream. The print edition has a nice overview article on Cloud Computing this week, reproduced online here. I’ve written just a bit on this topic in an earlier post, but to summarize the driving forces behind Cloud Computing as seen by the Economist:

Economies of scale: The large service providers can deliver computation and storage more cheaply by amortizing the cost over a large customer base. The expertise is already available in house to manage hardware installations, software upgrades, backup, fault tolerance, etc.

Convenience: users will be able to access their data and services from any device, anywhere.

Instant access to tremendous computation: new startups with the latest technology breakthroughs won’t have to invest in machine rooms filled with servers or hire the people to run them. Instead, they can pay for the necessary computation and storage by the hour on, for instance, Amazon Web Services.

Of course, Cloud Computing comes with the usual list of dangers and pitfalls:

Lock in: one cloud computing provider may become dominant, crowding out all competitors perhaps through unfair business practices. Even if there is a vibrant ecosystem, moving data from one cloud provider to another may not be easy.

Loss of privacy: large companies may maintain significant information about their users, for example, the entire search history of every user.

Perhaps my vision is obfuscated by all the hype, but I believe that the delivery of computing and storage as a utility for a significant class of applications is an inevitability. The list of above challenges is of course incomplete. For instance, see some very nice work from my colleagues on the privacy of computation in cloud environments. But I see these challenges as opportunities for industry and researchers in academia to address some of the pressing problems facing larger-scale adoption of Cloud Computing.

Greg Papadapolous, CTO of Sun Microsystems, gave the keynote presentation at the CNS research review this morning. This was personally very gratifying for me as Greg was an inspiration to me (and many other graduate students) during our work on the Network of Workstations (NOW) project.

Greg pushed the envelope in addressing some of the commonly held wisdom in large-scale computer systems design, namely:

Ethernet is good enough

SMPs are too expensive

Failures are frequent

As systems designers, we have become accustomed to embrace simplicity and weak semantics in the underlying system architecture, though often at the cost of increased complexity in the higher-level applications that are left to deal with the failures. His challenge to the audience was to consider the inherent costs of building more reliable, higher performance systems. In many cases, the savings to the application developers and the improvements in end-to-end performance will be valuable.

The talk went through a number of interesting examples. In Ethernet, we consider congestion spreading through the network (no isolation), packet drops, etc. to be things that application developers have to account for. In building large-scale systems, we assume that all parts should be commodity and quite failure prone. The networks we employ to connect data-center scale systems cap out at a relatively modest maximum bisection bandwidth, leaving many applications starved on the network.

Put another way, Greg advocates applying the end-to-end argument, a driving force behind network design. The end-to-end argument states that reliability, fault tolerance, etc. in the end belongs in the application because that is the only place that it can be completely implemented. One of my takeaways from the talk is that it is possible that we have become too lazy in applying the end-to-end principle. The original paper clearly states that there are exceptions to the principle, especially where additional performance gains are possible, or by extension in the current environment, reduced cost or energy consumption.

Are there opportunities to add additional engineering into the system infrastructure to reduce application complexity and reduce overall cost?