Application idling / scale to zero

Description

One of the great things about Heroku is that free apps idle when not in use. This helps control costs significantly, with only a small delay when an app spins up. (~12 seconds)

If we ever offer compute on GitLab.com, or for enterprises making heavy use of review apps while wanting to control their costs, we should support some way to scale apps down to 0 pods and automatically spin them back up when they are needed.

Note, this gets really complicated if your scale is greater than 1. Heroku only lets you idle on 1-dyno apps, for example. Now, it's possible it doesn't have to be that complicated, and Kubernetes has horizontal autoscaling based on CPU, so theoretically, once it's scaled down to 1, we can further detect that it's truly idle, and scale it down to 0.

Proposal

Add checkbox to enabled idling (perhaps only if scale it set to 1, or if autoscale min=1)

Add group and instance configurations to enable idling by default (or maybe we just enable it for default always)

NGINX on Kubernetes is configured to route requests to the default backend if there are no services/pods to take the requests

Replace default backend with a server that takes the requests, holds them, edits the appropriate service to scale the replica set from 0 to 1, and when the pod comes up, redirect/route the request to the new pod.

Optionally display an interstitial explaining what is happening (but this only works for HTML resources, not API requests, for example)

Have a reaper that puts services to sleep when they've been idle for 1 hour

Links / references

Documentation blurb

Overview

What is it?
Why should someone use this feature?
What is the underlying (business) problem?
How do you use this feature?

Use cases

Who is this for? Provide one or more use cases.

Feature checklist

Make sure these are completed before closing the issue,
with a link to the relevant commit.