Search

Hetzner Failover IP OCF script part III: When HTTP attacks

Our OCF script for failovers at Hetzner worked flawlessly the last month. Last week, however, a problem arose we did not anticipate. The webservice returns an HTTP statuscode (as is expected from a webserver) and we did not anticipate any HTTP errorcodes.

An HTTP response in the 4XX or 5XX range would kill the python interpreter with a traceback from urllib2 and an exit code of 1, a code which told the OCF script to return $OCF_NOT_RUNNING which caused a failover to occur. This wouldn’t be a problem in a normal operating environment.

Unfortunately, we noticed that the Hetzner failover webservice isn’t totally stable. This happens on both hosts in the failover setup, who will both try to failover and cause havoc. Fortunately, OCF has an errorcode which means a soft fail ($OCF_ERR_GENERIC), we can use this code to tell heartbeat a temporary failure has occurred and it should not failover.

The parse-hetzner-json.py script now has a try-except construction for the HTTP requests and has 3 exit codes:

0: Everything OK, I have the failover-IP

1: Unknown Error, can’t get status of the failover-IP

2: Everything OK, I do not have the failover-IP

The error-codes are then processed by the hetzner-failover-ip OCF script as follows: