Alexandre Polozoff on WebSphere Performance

Wednesday, March 30, 2011

I was having a discussion recently with one of my colleagues around server backups. He likened it to the spare tire of the car. Yes, you can drive around without the spare tire but does anyone want to do that for long stretches of time? Probably not.

Servers need to be backed up. Simply if a server completely tipped over it can be duplicated, rebuilt and put back into service.

Tuesday, November 9, 2010

In the developers resource references there is an attribute for global transaction sharing.The value should be Shareable if the application uses global transactions. But if one is using LTC then one does not need to share the connection. Change the res-sharing-scope to Unshareable as noted above. This eliminates a lot of contention for connection pool threads. For more details on LTC see the link above to understand how it works.

Friday, September 24, 2010

It is rare for anyone to provide details behind the root cause of a production outage. Facebook put out a report about an outage they had. If you're into troubleshooting and problem determination it is an interesting read. It sounds like they could turn off the particular function but had to completely restart the environment to do it. This is why it is important to have circuit breakers that can be activated dynamically.

One also wonders what infrastructural changes could be made in the environment to help? It sounds like the application logic continued to retry requests. This is why I'm not a fan of applications automatically retrying requests because when failure occurs the retries can quickly overwhelm the back-ends. A firewall could have at least help shut off the pipe to the database. Though the consequences to the application would have been no different and would still have required a restart since there seemed to be no way to dynamically shut off that particular function.

Certainly the error logic sounded confusing at best. And error paths through code are the ones least frequently tested so they tend to fail magnificently in production

Tuesday, September 14, 2010

I'm reminded today that after performance tests a simple check exists especially when adding more JVMs to a cluster. Count the number of exceptions in the log files. Of course, clear the logs before running the test. If the counts are not all roughly the same (or significantly skewed from the other app servers) then it is clear there are issues with that JVM that need to be checked. Sometimes it is configuration or a misplaced JAR file.

About Me

I am an IBM Master Inventor in IBM Software Group, Performance Engineering. I solve problems for a living.
I also have a few different blogs on various topics of interest to me... click to see my profile to see them all.
The postings on this site are my own and don't necessarily represent IBM's positions, strategies or opinions.