A disruptive day at the office could start when someone says: “Hey, I’m having problems with my OS; the last thing I saw is that there were a bunch of updates being installed”, and if we are not lucky and it is in fact an update that is generating the problem, we will need to get our hands dirty.

Getting in the situation when we know we need to rollback an update it is definitely not an easy situation. Maybe if we need to do it in our home computer it would not represent a big problem, but if we are facing a situation where we just approved the update for our entire organization, then things could get really messy.

In reviewing some of what we previously discussed in Patching Best Practices we will see that is almost a must to test our updates before deploying; but at this point, whether we tested or not, it does not matter, we need to solve this problem. However, you will see also that it is highly recommended to have ready a baseline image of our company user's system (maybe a virtual machine), this way we can rapidly start working with this image in order to solve the problem.

And of course, if we don’t have the baseline image, we can start working with the machine that appeared with the problem.

Start Working on the Problem

Having said the necessary about best practices, let’s start working on the problem. Here’s general guidance about what we need to do:

If you are using WSUS, mark the recent approved updates as “Not Approved”. This will not remove the updates from computers where the update is already installed, but it will prevent them from being installed on any other systems.

Use the baseline image to review the behavior when we have the updates approved.

Perform some quick troubleshooting (Event Viewer, application logs) to understand a little bit more about the problem.

To extend this automatically for all users we can use WSUS or scripting:

For WSUS, configure the updates as “Approve for Removal” -- this will automate the process of uninstalling.

Even more, we set a “Deadline” to remove the update.

If we want to remove it as soon as possible, select a date in the past.

Important disclaimer: The “Approve for Removal” option is not available for several updates, so we might want to use the “scripting” option.

Scripting the update:

For Windows XP we can find the folder (hidden) for the update C:\Windows\$NtUninstallKB<number> and generate a script for uninstalling.

For Windows 7 we can use a script with “wusa /uninstall /kb:<number>”

Handling Restore Points

“System Restore Points” are also a valid way to solve a faulty update deployment for Windows OS. A System Restore point is basically an operating system snapshot that is created whenever you make a significant change to your PC. One of the processes that can create restore points automatically is in fact Windows Update.

So, whenever we have a problem with a recent change, we can use a Restore Point to recover the exact state we have previous to our change.

Using System Restore Points

Using Restore Points does not have any big complication, just following a wizard; but the only trick is that we have to do it manually on every computer:

Access “Computer” > “Properties”.

Select “System Protection” and click on “System Restore”.

This will open up a wizard, click “Next” in the first step.

We will have the option for selecting the “Restore Point” to apply to our system, carefully select the proper one.

Reboot the computer.

With that we’ll complete the necessary steps to solve our problem. But if we are going to depend on restore points, we must make sure we have the necessary configuration among all of our computers.

Creating Restore Points

To create restore points in our computers we can do it of course manually or by an automated process.

Unfortunately, the automated process is not all that simple. We will use Group Policy to distribute this configuration among our organization, but we will need a few command lines to accomplish this.

Creating Restore Points manually

Access “Computer” > “Properties”

In the left pane, click “System Protection”.

Click the System Protection tab, and then click Create.

In the System Protection dialog box, type a description, and then click Create.

Creating Restore Points using Group Policy

Create a new Group Policy Object and link it to the Active Directory Container of your choice.

Create a new Task under [Computer Configuration\Preferences\Control Panel Settings\Scheduled Tasks]

Type a name and choose SYSTEM account to run this task.

In “Triggers” tab, click “New” and choose the period of time you would like to create a restore point.

In “Action” type “%windir%\system32\rundll32.exe" and in “Program/Script”: “/d srrstr.dll,ExecuteScheduledSPPCreation”

Having a “private cloud” deployed in an organization is far more common than it used to be a year or two years ago, when no one even knew what exactly this term meant. Or even more, we can say that you probably have already implemented a “private cloud” in your environment and you haven't noticed. The term “private cloud” is used to name a set of resources (hardware and software), that are delivered as a service over the network (if the network is the Internet, we call this a “public cloud”). The key aspect in this definition is the “as a service” section, since having these resources provided in this way we can guarantee agility, scalability, and reliability, among other benefits. To achieve and gain these benefits within our own “private clouds” (for example, our business application contained in two tiers, web and database servers), we must understand that several of our well-known processes for managing these services must change, including the patch management.

So, how do we apply system patches in a service that must be online almost “anywhere” and “anytime”? There are basically two approaches:

Patching in-place.

Re-building tiers with new updates.

Patching in-place

This is what we already know about applying system patches, updating the systems in the machines used to provide the service. But this doesn’t mean that we shouldn’t test the updates we are about to release; by definition itself of a “private cloud”, these services must have a proper stage for validating the maintenance process. To get a few best practices in this scenario, please refer to a previous article of mine: “Patching Best Practices: Patch Scheduling”

Re-building tiers with new updates

Also the definition of a “private cloud” tries to force us to de-couple the services/platforms provided from hardware and even operating systems. This definition will also demand to have some kind of “portability” in our systems, requiring for example to have automated and fast ways to deploy our entire service into a new set of machines. This process for “re-building tiers” will represent the way from which, by using an automated workflow ideally, we can replace the “out-of-date” OS used by the service with the “updated” OS without requiring any downtime. Automation is another key term for this approach. If we are planning to quickly replace operating systems, we need to have processes and tools that can give us these features and maintain availability of our systems. Microsoft provides a set of tools and platforms we can evaluate to accomplish this, starting with System Center Virtual Machine Manager 2012 (SCVMM).

SCVMM 2012 and Service Templates

SCVMM 2012 includes the use of “Service Templates” as the ability to group a set of Virtual Machines that are configured with several components, including applications, and that can be treated as one. The use of Services can have several scenarios where we can gain efficiency if we integrate it with SCVMM, one example is that we can scale-up our services dynamically whenever it is needed: we can add/remove virtual machines to support the load necessary for our business application without requiring re-defining the architecture or causing downtimes.

Generating the Workflow with System Center suite

Besides the “Service Template” concept in SCVMM, Microsoft is seeking to provide an entire solution for this scenario of re-building the tiers with new updates. Microsoft calls it "Automated Fabric Patching for the Private Cloud". To achieve a full automation of this process, it actually requires the use of several MS technologies:

Hyper-V (virtualization platform)

System Center Operations Manager - for monitoring the environment

System Center Service Manager - for generating the process for updating the "private cloud" (named "fabric" in this case)

System Center Orchestrator - for connecting the dots and generating the automation workflow

Getting in the detailed description of this workflow, the “Automated Fabric Patching” works this way:

The Missing Approach: Virtualizing Server Applications

I have a big expertise in virtualizing applications and specifically using Microsoft App-V. The latest version of SCVMM includes the platform for “Server App-V”, which is the technology use to “encapsulate” server applications (like a MySQL engine) into one “bubble” and convert it to a portable service. Having this portability we can easily move around different OS without requiring downtime.

In this post, my intention is to provide some guidelines about how to effectively manage and schedule patching in your organization.

(Editor’s note: PatchZone welcomes Augusto Alvarez. Augusto is no stranger to blogging - he has served as a thwack ambassador and has his own blog. He is now celebrating the publication of his second book, "Microsoft Application Virtualization Advanced Guide.")

Having a detailed and effective systems’ patching strategy is something that several organizations don’t pay much attention. Why is that? Most people think that it’s too expensive to have a plan and an entire platform for something as simple as clicking in “Install updates” in Windows Update. However, most people realize that they should have a plan when something goes wrong: getting a blue screen when rebooting a server; services not starting; applications unexpected downtime; or even worse, a security breach with an out-of-date system.

What do I Need to Know about Microsoft Schedules?

Microsoft has an unofficial strategy about when they release the updates. The second Tuesday of each month they release security patches; the critical updates are the exception, because those are released as soon as the patch is ready. Tuesday is the selected day, because the following approach:

• Tuesday: Updates are released (around 17:00 – 18:00)

• Wednesday: Apply updates in test environment.

• Thursday: Run use cases in test environment with new updates installed.

• Friday: Install updates in production.

• Saturday: Reboot your production servers.

Do I Really Need to Have a Test Environment?

The short answer: yes, you do need a test environment. But, let’s elaborate this for those that always try to avoid this matter. If you cannot afford the replication of your full production environment (servers and workstations), you can still use some reference machines currently in production but only those with low or no impact if something goes wrong.

For example, to test workstations you can use the IT department’s reference machines to test new updates or any other “friendly” user’s machine that won’t mind having to troubleshoot if some update does something unexpected. For mission critical services running on servers, most organizations should have a high availability scenario (for example: SQL Server cluster), and you can start patching using the “stand-by” node.

And of course we always have the virtualization alternative. Having one server (or even computer) with some resources available we can virtualize at least our main services and turn on those VMs when we need to test a new update; we can even use VMware Converter o System Center Virtual Machine Manager to convert Physical to Virtual Machines and place these machines in an isolated environment.

What do I Need to Know about my Environment?

Understand your environment, applications and services that need to be patched. There’s no need to have a good plan, and a schedule and test environment if you don’t know the right use cases of each platform you are updating.

If there’s a homemade application, request a developer to script or give you a simple test to run in order to validate the application is working properly. The same applies for other platforms, like a database server, messaging server like Exchange, SharePoint or any other; you must have a few tests to run when you update your platform, if those are automated, even better.

Should I Change my Backup Plan if I have a Good Test Environment?

If you are thinking to have a more relaxed backup plan, the answer is no. There are some obvious reasons why - hardware failures and user errors can still occur in production; but even if we don’t consider those, having a test environment is no “silver bullet”. There will be scenarios where the behavior in testing can be slightly different than in production, and that “slight difference” can make a huge impact if we don’t have a way to recover it.

Even more, review your backup plan and ensure proper testing for those backups is being executed periodically.

Do I Need to Review my Current Service Level Agreement (SLA)?

Yes, of course. This is an important matter since, if we have a defined SLA, we will know the downtime windows we can have in our environment, therefore we will understand the schedule for applying updates we need in our organization.

And we can also find in the SLA, the priority for some services that will give us the input whether to implement a test environment for that service. For example, a SLA that requires high availability for the messaging platform will need to have replicated servers to properly test new updates.

Do I Really Need to Document the Patch Management Processes?

Please do. This is not just any other boring process for IT. For example, properly documented steps for testing will give us a way to guarantee repeatable and predictable steps in production.

Final Thoughts

As I always say, “there’s no golden rule” in the IT world but you can find general guidance and best practices. The best solution suited for your organization won’t apply in the next company; we must always assess and understand our environment, taking into account several key factors like: budget and costs; internal policies; legal compliances; defined SLAs and so on.

SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 130,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process. Learn more today by joining.

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website,
you consent to our use of cookies. For more information on cookies, see our cookie policy.