As I proceed down the path on a consolidation project, I have taken time to pause and ponder the approach. One driving goal is to shrink our footprint. Another goal is to improve performance and use of hardware and software resources. I am looking to make sure the appropriate consolidations are made while leaving appropriate servers as-is if appropriate.

From time to time we have the opportunity to choose between distributed processing and a more centralized approach. In my experience, this decision has come about as a means to improve performance. Distributed processing is a viable option when done correctly. I have seen this work in the past and I have seen it fail miserably. I have had the opportunity of reversing these types of solutions via server consolidation on two occasions.

What are some considerations when deciding to employ a distributed model or even a consolidated model? Denny Cherry has a good article about the considerations for moving to a consolidated model. You can read that article here. I think some of those very considerations should be used when electing to not consolidate. Both sides of the equation must be evaluated. If you can meet the criteria, then consolidate. If not, then have some very good reasons to consolidate. Conversely, if looking to distribute, there must be some very sound reasons as to why one would elect that option.

To illustrate why not to distribute, I will share the two experiences I mentioned.

Experience 1

Performance for the primary database server had become horrendously unacceptable. In a server that hosted one million users or more a day, something had to be done. Complaints were common and frequent from the client and noticeable from in-house processes. The decision was made to distribute processing to three servers. Two of the servers were “report” servers and the third was the primary transaction server. Transaction replication was used to copy the data to the report servers. The application was configured to use the various servers through various means. Each of the servers had linked servers created to help process requests from the application. It was common for queries to join tables across the linked servers to produce the results necessary. After these changes, the complaints were just as common and frequent about the application timing out and performance being horrendous.

When I started, the first thing to do was to start tuning queries. I had gotten the run-down on how bad things were and started implementing a plan to correct the problems. All of the servers involved in this process maintained >= 70% processor utilization at all times. If a process/query had done awry, then we would see spikes to 100%. Each server was running SQL Server Enterprise Edition with databases in SQL 2000 compatibility mode.

Experience 2

As luck would have it, this was a very similar situation as the one just described. This scenario is different in that the number of users was far fewer. The number of servers involved though was considerably more. Each server was running Enterprise Edition. There was some mix of SQL 2000 and SQL 2005. One major difference was the level of user access. Some power users were known to have access to the databases with the ability to write their own queries. On a regular basis they would grind the server to it’s knees. In cases like this, new servers were frequently deployed to try and counteract this problem.

Another big difference between the two is the lack of replication in this experience. At least a traditional replication method. Databases were backed up and restored two different servers from the production databases on a daily basis. The benefit here is a test of the backups on a daily basis. This should be done anyway, but being implemented as a means to improve performance is not the wisest decision. This proved to be quite costly.

Solution

In both cases, the solution was very much the same. Performance tuning had been neglected and was tantamount to reduced costs. What performance tuning was necessary in these cases? In both cases, the lack of clustered indexes was rampant. Both cases employed the use of nolock as a means of performance tuning their queries. In each case where nolock was used, the query was analyzed and tuned to perform several times faster. These are some pretty basic techniques to use in order to minimize cost and improve performance. Throwing hardware at the situation is not always the best solution.

In the first scenario, when employing these simple techniques, we were able to reduce processor utilization down to 35% with peaks to 55%. That was prior to removing the replication and linked servers and consolidating the servers into one server (thereby reducing cost). After merging all three servers into one, we saw one server use 15% processor on a persistent basis with spikes to 35%. This change speaks to the cross-server joins and replication overhead. That was a massive improvement over the usual 100% processor utilization.

In the second scenario; simply combining the servers back to one, tuning the queries, and implementing controls on end-user queries direct to the server was all that was needed. This amounted to a reduction of about ten servers. Included in that reduction is the reduction in Enterprise Licenses.

Conclusion

So when should I not distribute the processing across servers? Do not employ a distributed processing approach when performance tuning has not been done. Do not employ a distributed processing approach when it is political in nature (i.e. end-users who write their own queries). Employing a distributed method under either of these two scenarios is only driving up cost. Furthermore, it becomes a management headache for the database team. The team must maintain the processes involved to keep the distributed approach running. This can also render a database team less efficient due to more work.

If it is appropriate and due diligence been done, then go ahead and employ a distributed processing methodology. As long as the work has been done to get the databases finely tuned, keep the customer happy, and abide by policy – there is nothing wrong with it.

It is the wrong approach when no forethought,no foresight, no planning,lack of due-diligence, or general wasting of money has been employed. Just throwing hardware at a problem is not a good solution. These “wrong” methods are just a few ways to try and mask the real problem. Masking the problem never solves it. It takes time and troubleshooting patience to delve into performance problems. When solved though, at no additional cost, it feels pretty good and looks pretty good for you. This is how a DBA can say “I’m givin’ ‘er all she’s got!” without it just being a quote.

Comments

Posted by Steve Jones on 22 February 2010

I'm not sure I completely agree. The places where it is political, when you have someone that wants ad hoc access to write queries, I'm very tempted to give them their own server. Spending $4k on a server for them can be much cheaper than developer time to tune a lot of queries or have them bring the main server to its needs.

I think you bring up points. You ought to examine things from both perspective and see what makes sense. Don't try to support a consolidation effort (or prevent it). Examine the pros and cons of both approaches and see what makes the most sense.

Posted by Jason Brimhall on 22 February 2010

Steve, that is a good perspective. Beyond just writing a bad query from the performance perspective, I would also be concerned that the query does not return accurate results. If the business relies on these ad-hoc queries, it is good to have somebody else verify them at the very least.