Agent API requests for job logs continue to be served correctly, and all systems are stable.

Posted 7 months ago. Feb 10, 2019 - 15:22 AEDT

Monitoring

One of the job log storage databases experienced an unplanned failover, and as a result some Agent API requests for job log storage failed between the period 03:05 UTC and 03:10 UTC.

Buildkite Agents > v3.8.3 will have continued to retry posting their job logs to the Agent API, but jobs running on earlier agents may have truncated job logs for jobs run between 03:05 UTC and 03:10 UTC. We recommend upgrading to Buildkite Agent v3.8.3 or above, which provides improved retry behaviour in the case of job log API problems.

Posted 7 months ago. Feb 10, 2019 - 14:47 AEDT

Identified

An elevated error response rate was detected from our Job Log Agent API endpoint, and has since recovered. An automatic failover has already taken place, and responses have returned to normal, but we'll continue to investigate the underlying cause and continue to monitor the related systems.