Overview

Diego is a self-healing container management system that attempts to keep the correct number of instances running in Diego cells to avoid network failures and crashes. Diego schedules and runs Tasks and Long-Running Processes (LRP). For more about Tasks and LRPs, see How the Diego Auction Allocates Jobs.

How Diego Runs an App

The following sections describe how Diego handles a request to run an app. This is only one of the processes that happen in Diego. For example, running an app assumes the app has already been staged. For more information about the staging process, see How Applications are Staged.

The diagrams below do not include all of the components Diego. For information about each Diego component, see Diego Components.

Note: The images below are based on the VM names in an open-source deployment of Cloud Foundry Application Runtime. In Pivtoal Application Service (PAS), the processes interact in the same way, but are on different VMs. Correct VM names for each process are in the components sections of this topic.

Receives the Request to Run an App

The Cloud Controller passes requests to run apps to the Diego BBS, which stores information about the request in its database.

Passes Request to the Auctioneer Process

The BBS contacts the Auctioneer to create an auction based on the desired resources for the app. It references the information stored in its database.

Performs Auction

Through an auction, the Auctioneer finds a Diego cell to run the app on. The Rep job on the Diego cell accepts the auction request.

Creates Container and Runs App

The in-process Executor creates a Garden container in the Diego cell. Garden downloads the droplet that resulted from the staging process and runs the app in the container.

Emits Route for App

The route-emitter process emits a route registration message to Gorouter for the new app running on the the Diego cell.

Maintains a lock in Locket to ensure only one auctioneer handles auctions at a time.

Job: bbsVM: diego_database

Maintains a real-time representation of the state of the Diego cluster, including desired LRPs, running LRPs, and in-flight Tasks.

Provides an RPC-style API over HTTP to Diego Core components and external clients, including the SSH Proxy and Route Emitter.

Ensures consistency and fault tolerance for Tasks and LRPs by comparing desired state with actual state.

Keeps DesiredLRP and ActualLRP counts synchronized. If the DesiredLRP count exceeds the ActualLRP count, requests a start auction from the Auctioneer. If the ActualLRP count exceeds the DesiredLRP count, sends a stop message to the Rep on the Cell hosting an instance

Monitors for potentially missed messages, resending them if necessary

Job: file_serverVM: diego_brain

Serves static assets that can include general-purpose App Lifecycle binaries and app-specific droplets and build artifacts

Job: locketVM: diego_database

Provides a consistent key-value store for maintenance of distributed locks and component presence

Job: repVM: diego_cell

Represents a Cell in Diego Auctions for Tasks and LRPs

Runs Tasks and LRPs by creating a container and then running actions in it

Periodically ensures its set of Tasks and ActualLRPs in the BBS is in sync with the containers actually present on the Cell

Manages container allocations against resource constraints on the Cell, such as memory and disk space

Streams stdout and stderr from container processes to the metron-agent running on the Cell, which in turn forwards to the Loggregator system

Runs a Diego sync process to ensure desired app data in Diego is in sync with the Cloud Controller.

App Lifecycle Binaries

The following three platform-specific binaries deploy apps and govern their lifecycle:

The Builder, which stages a CF app. The Builder runs as a Task on every staging request. It performs static analysis on the app code and does any necessary pre-processing before the app is first run.

The Launcher, which runs a CF app. The Launcher is set as the Action on the DesiredLRP for the app. It executes the start command with the correct system context, including working directory and environment variables.

The Healthcheck, which performs a status check on running CF app from inside the container. The Healthcheck is set as the Monitor action on the DesiredLRP for the app.