November 13, 2012

Best Practice - Plan for Microsoft Dynamics CRM (on-premises deployment) server redundancy before you need it

It could have made for a very long Weekend. A production CRM platform server without redundancy starts to show signs of a drive failure at the busiest time of the year for a global company that has grown to depend on their CRM system for their business. This happened recently to one of our clients and due to some rather fortunate planning and sharp execution, the problem was averted and the system was down for only the time it took to reboot a machine.

Some 5 years ago, the company had implemented CRM to house subscription data that was constantly being updated by a portal application exposed to the web. In order to increase performance, the Application and the Back Office Roles of CRM had been segmented onto two different servers. The Back Office server was a repurposed SQL server that, after those years of service, began to log errors in the event log of the impending failure of the secondary drive.

After analyzing the drives contents and realizing there were no obvious dependencies, our first thought was to just disable the drive. The problem was that we did not know that if we were we to disable or remove the drive and find that there was a dependency, we could ever get it to mount back up.

We then thought about creating a new server as a replacement. This was going to be a tedious job as the Platform Server was internet facing (SSL and certificates would need to be configured) and the integration with the portal server as well as other back office integrations were dependent on this server. The integrations were using Scribe and it is licensed for the machine so getting that quickly installed with the proper licensing was also a challenge.

Enter fortune. The client had recently decided to replace the machines and move to virtual environment for the CRM servers as well as others in their IT infrastructure. This failing machine was one that was slated to be migrated so here was the plan.

Using their backup software’s ability to restore a machine backup to a virtual, the server with the failing hard drive was backed up then restored as a virtual without the D-drive image. Because the virtual image was an exact duplicate of the original machine, all of the active directory GUIDs, IP address, and computer name remained the same. This computer was then loaded in an isolated environment to make sure it would boot properly (since it was an exact copy, it had to be isolated so as not to conflict with the existing production server).

We then scheduled time on a weekend where the traffic would be the lightest to pull of the switch. At that time, we simply shut down the physical production server and booted the new virtual production server. We knew that if anything went awry, we could always restart the old server as nothing on it had changed. It was nice to have a solid fall back should things not go as planned.

The switch went as desired and the new, virtual server came up and was able to operated just as it had when it was a physical box (without the drive errors, of course). There were a couple of issues when it was all said and done but they were minor and did not impact the operation of their business as these were hashed out.

The takeaways from the ordeal are this.

Though building a redundant system can take more time and money up-front, it will save you from relying on fortune when the hardware does fail. It is a good idea now to plan for the what-if scenarios that you could be confronted with so that you are prepared to jump proactively into action and lastly, it is a great idea to have extra “virtual” space to create a replacement machine on short notice.

Here’s hoping you do not find yourself in a similar situation but if you do, you can be prepared!