Contents

Overview

Mozilla needs to run its applications on various mobile devices, such as Tegras, Pandas, and even full smartphones. These devices do not act much like the servers that fill the rest of Mozilla's datacenters: they have limited resources, no redundancy, and are comparatively unreliable. With the advent of Firefox OS, Mozilla also needs the ability to automatically reinstall the entire OS on devices.

Mozpool is a system for managing these devices. Users (automated or human) who need a device matching certain specifications can request one from Mozpool, and Mozpool will find such a device, installing a new operating system if necessary. The middle layer of the system (Lifeguard) handles such reinstalls reliably, and also detects and investigates device failure, removing problematic devices from the pool. System administrators can examine these failed devices and repair them, returning them to the pool. The lowest level, Black Mobile Magic (BMM), handles low-level hardware details: automatic power control via IP-addressable power switches; a network-hosted Linux environment for performing software installations; and pinging, logging, and so forth.

Because continued operation of this system is business-critical, it is designed to be resilient to failure not only of individual devices, but to the servers running Mozpool itself.

Architectural Description

Source

User Interface

The Mozpool user interface is available through a web browser. The home page shows the three layers of the system (Mozpool, Lifeguard, and BMM). Clicking on any of those shows a UI specific to the layer. The BMM UI allows direct control of device power, as well as manual PXE booting; this layer is of most interest to datacenter operations staff. The lifeguard layer allows managed PXE boots and power cycles, as well as forced state transitions.

Deployment

Mozpool is a Python daemon that runs on multiple imaging servers. It uses a database backend and HTTP API for communication between servers. Its frontend is a dynamic web application. The BMM equipment - TFTP servers, syslog daemons, and so on - runs on the same systems.

Mozpool is designed to be deployed in multiple "pools" within Mozilla. The first and likely largest is release engineering.

Release Engineering

In the scl3 datacenter, we have an initial deployment of 10 racks of Pandaboards. Each rack holds about 80 Pandas, grouped in custom-built chassis, for a total of about 800 pandas. Each rack also contains seven "foopies" (proxying between pandas and Buildbot) and one imaging server. Each rack has a dedicated VLAN, keeping most network traffic local to the rack. The database backend is MySQL. See the puppet modules, linked above, for more details of the deployment.

At the BMM and Lifeguard levels, each imaging server is responsible for the pandas in its rack, as assigned in inventory. At the Mozpool level, each imaging server is responsible for all requests that were initiated locally. Mozpool uses HTTP to communicate with Lifeguard on other imaging servers when it needs to reserve a non-local device.

Mozpool Client

In Release Engineering we use the mozpool client to talk with the Mozpool servers to request panda boards.
To do this we install the python package inside of a virtual environment.
The package is stored in pypi: