Main Monitoring Dashboards

We collect data using InfluxDB and Prometheus, leveraging available exporters like the node or the postgresql exporters, and we build whatever else is necessary. The data is visualized in graphs and dashboards that are built using Grafana. There are two interfaces to track this, as described in more detail below.

Prometheus

We have 3 prometheus clusters: main prometheus, prometheus-db, and prometheus-app. They provide an interface to query metrics using PromQL. Each prometheus cluster collects a set of related metrics:

Adding Dashboards

The Grafana repo where we keep an archive of InfluxDB dashboards created in Grafana. Use these to see details in the file structure, but note that the repo is truly an archive (nothing populates from it) and can be out of date.

Need access to add a dashboard? Ask any team lead within the infrastructure team.

Selection of Useful Dashboards from the Monitoring

Blackbox Monitoring

GitLab Web Status: front end perspective of GitLab. Useful to understand how GitLab.com looks from the user perspective. Use this graph to quickly troubleshoot what part of GitLab is slow.

Private Whitebox Monitor

Daily overview: shows endpoints with amount of calls and performance metrics. Useful to understand what is slow generally.

Logs

Network, System, and Application logs are processed, stored, and searched using the ELK stack. For monitoring system performance and metrics Grafana is still the preferred interface. However, for investigating errors and incidents raw logs are available via Kibana at https://log.gitlab.net.