Cloud Foundry Java App Errors – Root Cause Analysis

On the night of March 29, 2012, we upgraded our Tomcat from version 6 to version 7 as part of our normal production upgrade schedule.

Early in the morning of March 30th, we discovered that Java apps that were created or re-pushed were receiving an error.

At 9am PT that morning we identified the specific issue causing the error. The script catalina.sh was modified in Tomcat 7 to start Tomcat using eval exec instead of exec. We were setting the CATALINA_OPTS environment variable to include values for -Dhttp.nonProxyHosts that contained a pipe character, which was interpreted by eval as Unix pipe. This affected the application start command, leading to a failure to start the application. This change did not affect our QA environment, as there was no proxy configured (or required), and thus the problem escaped our testing. Once we understood the issue, our first priority was to minimize the impact on our users. The best course of action was deemed to be reverting back to the last known good version (Tomcat 6).

The version rollback and validations completed at 1:20pm PT.

This problem impacted about 60 people who were performing pushes or creating apps during the time Tomcat 7 was in place.

We are learning from our mistake. We will start by adding tests based on this incident, and will continue investigating other mechanisms to prevent similar issues from occurring in the future.

We would like to apologize to all of you who suffered from this problem.