Featured in DevOps

Adin Scannell talks about gVisor - a container runtime that implements the Linux kernel API in userspace using Go. He talks about the architectural challenges associated with userspace kernels, the positive and negative experiences with Go as an implementation language, and finally, how to ensure API coverage and compatibility.

Handling Traffic Spikes from Global Events at Facebook Live

Facebook Live's engineers have recently discussed how they scale their systems to handle traffic from both predicted and unpredicted events. While the latter is handled by their global distributed architecture, the former involves careful advance planning and load testing.

Facebook's Live feature allows users to share media, including videos, in real time. This feature sees a lot of usage during global events. The architecture required to support such scale has specific requirements, outlined in some previous talks. In a new article, Peter Knowles talks about planning and load testing the Live feature for New Year’s Eve. Identification of the key areas -- network, CPU and storage -- followed by load prediction and load testing using shadow traffic generation form part of the strategy.

Scaling needs can be specific to different sections of the site. For example, scaling the live updates feed for a presidential inauguration used caching and dark launching. Facebook Live is different from a scaling perspective as caches cannot be preloaded (since it’s live), and it’s not easy to predict the number of viewers and consequently calculate resource requirements ahead of time. It is also not possible to predict the number of concurrent streams/viewers for world events accurately.

Global events like sports, elections, natural disasters or social media phenomena can all cause traffic spikes, especially due to the high bandwidth content of Live videos. There are three categories of load changes caused by a change in usage patterns: routine, spontaneous and planned. Routine patterns are predictable -- they follow a known model like a decrease in traffic on the weekends -- and the means to handle these are built-in into the infrastructure. Spontaneous traffic like unplanned world events and disasters can be difficult to manage. Events like New Year’s Eve that are known in advance need special planning.

Facebook Live’s video streaming architecture spans client apps to Points-of-Presence (PoPs) to their globally distributed datacenters. The client apps generate video streams, which are sent to the nearest edge PoP over RTMPS, a streaming media protocol over a secure socket. Sending it to the PoP terminates the streams as geographically close as possible to the client device. Each PoP sends the input to a datacenter, where it is processed by a Facebook Live server. Facebook datacenters are spread across the world. This architecture is shared by other Facebook services too. Once inside a datacenter, all communication between services is via the Facebook backbone network, where the round trip time (RTT) is ~30ms. PoPs also cache content once it’s generated, saving a trip to the datacenter for future requests.

In a Facebook live server, the video stream is transcoded into multiple audio video formats, which are delivered via the Facebook CDN. The video stream processing is the CPU-intensive part of the pipeline. The various formats are also stored for future requests.

Planning for an event like New Year’s Eve involves load prediction from historical data as well as load testing. Load prediction takes into account the total number of broadcasts, peak number of concurrent broadcasts, and load generated on other systems as a result of the load on Facebook Live systems. Multiple teams - production engineering, capacity planning and data science - collaborate in this planning. Adding more hardware is obviously part of the exercise. Making the processing more efficient is another - although the article touches upon just one example of bunching media segments into longer ones for more efficient writes. A technique called shadow traffic generation by replicating incoming traffic tests all aspects by stressing invidividual hosts and clusters. This is not a new technique and has been used by others.

A common problem in scaling is the thundering herd problem, where a large number of requests wait for a single event but only one request can be processed at a time and the remaining have to wait. Each time the event occurs, all the requests wake up and cause unnecessary thrashing. In Facebook Live's case, this can happen when many requests are made to a PoP which does not have the requested video in its cache and must hit the backend datacenter to fetch it. The Live architecture handles this by request coalescing which reduces the requests that hit the datacenter. For all client requests for the same video segment that hit the PoP, only one request is sent to the datacenter if the segment is not in the cache. The other requests are kept on hold until the response arrives, thus avoiding flooding the datacenter with multiple requests that can hit the storage backend and are resource intensive. The PoPs still face the brunt of the end users' traffic, and this is handled by global load balancing. A system called Cartographer keeps track of the load on each PoP, and accordingly sends each user to the nearest PoP that has capacity.