What does success look like, and how can we measure that?

Auto DevOps jobs are triggered automatically after instance is upgraded.

Risks

There are several discussions in various threads about some risks that will need to be approved before we enable this for on premise customers. These have been separated into known risks (which are mostly UX issues) and hypothetical risks which are more complicated and relate to security, reliability and costs. The risks I've outlined are exactly that "risks". We don't know to what extent our customers may suffer from any of these problems but they are at least things we think our customers may end up experiencing and being bothered by. These risks are intended to be an objective list of things we believe customers may be frustrated by when we enable it for them by default and not just an exhaustive list of things Auto DevOps could do better but this can occasionally be subjective so may need revision.

Current users of Auto DevOps believe it to be quite slow due to lots of docker images being downloaded uploaded which can waste lots of runner time (the pipeline can take around 30 minutes on a fast internet connection): https://gitlab.com/gitlab-org/gitlab-ce/issues/49562

Hypothetical Risks

Customers making use of external CI (eg. Jenkins) may experience strange results for CI (eg. failed Merge Request that is actually passing on Jenkins). We have not done any testing about how Auto DevOps interacts with external CI like Jenkins.

Customers may configure their own CI runners which are now running the Auto DevOps pipeline possibly unexpectedly as we enable this setting for them. As such running certain commands on their servers (runners) may cause very strange things to happen (eg. running rspec when you have a DATABASE_URL set on the server can cause very dangerous things to happen, like truncating a production DB, if you weren't intending on running this command on this host). This risk is higher with shell runners as they inherit the entire environment and have wider access to the server filesystem etc.

Customers with very large numbers of repositories and high numbers of pushes may experience significant delays to their CI/CD which will potentially affect developer productivity or delay production deployments as their runners need to catch up with a very long queue of jobs

Customers with very large numbers of repositories and high numbers of pushes may end up with a very large object storage bill as we store the docker images created in the build stage of Auto DevOps

Depending on where the runner is hosted and where the object storage is hosted the customers with very large numbers of repositories and high numbers of pushes may end up with a very large ingress/egress bill for docker images being pushed/pulled during the build and container_scanning and deploy phases of the Auto DevOps pipeline

There is a slim chance that somebody has an existing project configured with a Kubernetes cluster but not using GitLab CI that is now going to end up being deployed to a cluster with an internet facing URL but never intended this to be deployed to the public. This is now incredibly unlikely that we don't plan to automatically set a domain name for them anymore (see https://gitlab.com/gitlab-org/gitlab-ce/issues/45560#note_101947623). We did however at least have one customer complain about this on gitlab.com as they were concerned that their application may have been made public online.

Customers may have docker runners setup that do not support Docker in Docker and as such their builds will fail in a way that is not helpful to them and will cause some frustration

Risks should be accepted before we merge #21157 (closed) (not needed until we've done some 1% testing on Gitlab.com):