Redis for PCF Smoke Tests

Redis for Pivotal Cloud Foundry (PCF) runs a set of smoke tests during installation to confirm system health.
The tests run in the org system and in the space pivotal-services.
The tests run as an app instance with a restrictive App Security Group (ASG).

Smoke Test Steps

The smoke tests perform the following for each available service plan:

Targets the org system and space pivotal-services (creating them if they do not exist).

This allows outbound traffic from the test app to the Redis shared-VM nodes.

Smoke Tests Resilience

Smoke tests could fail due to reasons outside of the Redis deployment; for example network latency causing timeouts
or the Cloud Foundry instance dropping requests.
They might also fail because they are being run in the wrong space.

The smoke tests implement a retry policy for commands issued to CF, for two reasons:
- To avoid smoke test failures due to temporary issues such as the ones mentioned above
- To ensure that the service instances and bindings created for testing are cleaned up.

Smoke tests retry failed commands against CF.
They use a linear back-off with a baseline of 0.2 seconds, for a maximum of 30 attempts per command.
Therefore, assuming that the first attempt is at 0s and fails instantly,
subsequent retries are at 0.2s, 0.6s, 1.2s and so on until either the command succeeds or the maximum number of attempts is reached.

The linear back-off was selected as a good middle ground between:
- Situations where the system is generally unstable-such as load-balancing issues-where max number of retries are preferred, and
- Situations where the system is suffering from a failure that lasts a few seconds-such as restart of a Cloud Foundry VM
where it is preferable to wait before reattempting the command.

Considerations

The above retry policy does not guard against a more permanent Cloud Foundry downtime or network connectivity issues.
In this case, commands fail after the maximum number of attempts and might leave claimed instances behind.
Pivotal recommends disabling automatic smoke test runs and manually releasing any claimed instances
in case of upgrades or scheduled downtimes.

Troubleshooting

If errors occur while the smoke tests run, they are summarised at the end of the errand log output.
Detailed logs can be found where the failure occurs.
Some common failures are listed below.

Examine the broker-registrar installation step output and troubleshoot any problems.

When encountering an error when running smoke tests, it can be helpful to search the log for other instances of the error summary printed at the end of the tests,
for example, Failed to target Cloud Foundry.
Lookout for TIP: ... in the logs next to any error output for further troubleshooting hints.