You Should enable JavaScript For Access This Site.

ads

Latest

Sponsor

Featured author

admin

Steven Jordan is an infrastructure and process management specialist. Steven holds a Master of Science degree in ICT from the University of Wisconsin Stout. Steven is also a Cisco Certified Network Professional (CCNP) and Master Gardener.

Task:

Problem:

Solution:

Does the TempDB run on the old HHD drive or the new MLC SSD? Consider how the TempDB can generate serious disk I/O. Transfer the TempDB to the high-speed SSD as well. N.B., this entire process gave new life to my old DPM2010 server.

Step 1: TSQL script to identify TempDB location:

Use master
GO
SELECT
name AS [LogicalName]
,physical_name AS [Location]
,state_desc AS [Status]
FROM sys.master_files
WHERE database_id = DB_ID(N'tempdb');
GO

The file "tempdev" has been modified in the system catalog. The new path will be used the next time the database is started.
The file "templog" has been modified in the system catalog. The new path will be used the next time the database is started.

Step 3: Restart SQL services

Step 4: Verify Change - See Step 1.

Conclusion:

SQL server runs quick with new storage. Disk queue lengths, for all disks, are a thing of the past. Individual results may vary based on load (duh).

Problem:

Data Protection Manager (DPM) does not work with new Ultrium-5 LTO-5 tape drive. DPM Library only provides options to clean drive or disable drive. DPM does not correctly identify and detect the new tape drive.
Hardware: HP StoreEver LTO-5 Ultrium-5 tape drive. N.B., this solution works with just about any LTO drive.

Background:

Old LTO-3 drive was running out of space. Purchased HP LTO-5 drive to expand available storage from 800GB to 1.5TB:

Server has compatible SAS HBA and cable.

Device Manager recognizes the new drive after a simple hardware swap.

DPM lists the new hardware under its Libraries.

Protection groups changed and reflect new hardware.

Solution:

Update the DPMLA.xml file to reflect the new hardware:C:\Program Files\Microsoft DPM\DPM\Config\DPMLA.xml
XML Before:

The backup operation that started at '‎2015‎-‎07‎-‎28T15:32:54.838780500Z' has failed with following error code '0x8078015B' (Windows Backup encountered an error when accessing the remote shared folder. Please retry the operation after making sure that the remote shared folder is available and accessible.). Please review the event details for a solution, and then rerun the backup operation once the issue is resolved.

C:\Windows\Logs\WindowsServerBackup\Backup.log: Backup of volume C: has failed. Windows Backup encountered an error when accessing the remote shared folder. Please retry the operation after making sure that the remote shared folder is available and accessible.

Suggestions:

Increase the allocated replica volume size by using the modify the disk allocationwizard from the DPM GUI. Run a full consistency check.

Do not run WSB on the protected 2012 server -only manage backups from DPM. Individual backups run from WSB modifies the backup catalog and VSS volume associations -DPM backups begin to fail. Steps to correct:

Manually increase the size of the Replica volume. DPM does not calculate an appropriate replica size for bare metal backups. Microsoft recommends:

Data Source Size x 3 / 2

However, I found this formula was insufficient. Rather, increase storage liberally. If Microsoft's forumla call for 50GB, try using 100GB! Consistency checks will work after the resource has sufficient storage in DPM.

Event Log Errors:

-The backup operation that started at xxxx has failed because the Volume Shadow Copy Service operation to create a shadow copy of the volumes being backed up failed with following error code '0x80780119'.

Background:

The EFI-based server uses a GUID partition table (GPT) file system. A GPT drive includes the following partitions:

*Update:Pruneshadowcopy.ps1 is
supposed to delete old snapshot volumes from the server -it stops working after WMF 3.0 is installed. Thousands of
expired snapshot volumes are never removed from the server. These stale volumes cause a slew of other
problems: Windows updates fail, ridiculous startup times,
excessive registry size, slow logons, slow recoveries, thousands of orphaned
devices, etc....Use these steps to resolve:

DPM has slow start and login times. The server locks or freezes for up-to an hour.

Windows updates fail.

SQL queries are excessively slow. For example, it takes DPM an hour to display all restore points for a protected database.

DPM uses all available server memory.

Background: The DPM server has limited memory (e.g., 16GB). The server synchronizes a large number of SQL databases every 15 minutes. The server has become unreliable.

DPM supports a limited number of protected resources. Over-provisioning DPM servers become unstable. Recovery point volumes (i.e., incremental backups) are not automatically removed.

DPM runs a daily Powershell script that removes expired recovery point volumes (incremental VSS backups). This process fails when the server is over-provisioned. The script times-out if it does not have sufficient available memory. This situation results in thousands of recovery points that causes additional problems.

Solution:

Discover and remove expired shadow\ recovery point volumes.

-Run pruneVSS.ps1 from the DPM Management Shell. -Run this script for every protected data source.

Shrink Volumes (optional). In the first step we deleted thousands of phantom volumes. This may amounts to hundreds of Gigabytes or Terabytes. Let's reclaim that storage.

Let's consider how DPM storage management is flawed. It creates and expands volumes well enough. Microsoft did not, however, include an automated process for recovering unused space -this is a manual process.

The preferable method for shrinking Windows 2008 volumes is with diskpart. N.B., Disk manager GUI works as well.

The system registry stores volume information -including those pesky phantom drives. All the data in the system registry is stored inside a single file:

C:\Windows\System32\config\registry.

Now consider how the registry file grows larger with every additional phantom drive. Large registry files are bad for the server. It causes slow boot-times (e.g., hours) and may cause updates to fail.

Worse yet, the registry file generally grows but does not shrink. It's similar to how DPM handles storage volumes. Therefore, the registry remains indefinitely bloated.

Thankfully, Microsoft provides a chkreg tool to manually shrink the registry file. This procedure requires a special version of Chkreg -only available by contacting Microsoft support. The tool is also available from my OneDrive. Also, this tool must be run while the server is offline. Run it from a separate Windows boot disk (e.g., Win2Go).

Problem: Windows updates fail on DPM 2010 servers. Additional symptoms include slow startup and account sign-ins. Updates fail because of excessive VSS volume information in the registry. Log on times are slow because it takes a long time for Windows to enumerate thousands of orphaned volumes listed in its registry.

The slow logon process interferes with updates and software installations. During the installation, Windows enumerates driver information from the registry. Installations will fail when the enumeration takes longer than 15 minutes,

Solution: Delete the Windows Update cache and remove all superseded service pack backup components to resolve the issue. N.B., This situation is not specific to DPM; and can help with other Windows environments, including Windows 8.1 & Windows 2012 R2.

Problem Statement: The DPM server takes a long time to logon. It can take 15 to 90 minutes to logon after the server restarts. Additionally, Windows updates fail and rolls back to its previous state after the server restarts. The integrity of system backups and restorations are at risk because DPM server has become unreliable.

Additional Symptoms:

a.) Expired recovery points are not removed per DPM policy goals. Roughly half of the protected members show excessive recovery points in the DPM console:

c.) PruneshadowcopiesDPM2010.ps1 is a DPM PowerShell script that removes expired recovery points. The script hangs and does not remove expired recovery points.

d.) DPM console hangs when deleting inactive protection group members. The GUI is unresposive and must be manually closed.

e.) The registry System file has bloated to over 220MB. System is located in c:\windows\system32\config\.

Figure 2. Bloated registry.

Root Cause:

There were excessive disk based recovery points (i.e., VSS volumes). In our case, DPM (or Windows) had improbably kept tens of thousands of recovery points per proctection member. DPM, by design, is only supposed to store up to 64 recovery points for its file members, and up to 448 recovery points for its application (e.g., SQL database) members.

The problem did not affect every protection group member. Some members (e.g., recent additions) that had less than 100 recvery points. However, nearly half of all the protection group members had excessive (e.g., over 20,000 ) recovery points (Figure 1).

The excessive, or rather expired, recovery points had to be removed. Normally, DPM automatically removes expired recovery points with its PruneshadowcopiesDPM2010.ps1. The default script was not working so I turned to a custom PowerShell script named PruneVSS.ps1.

PruneVSS.ps1 is a handy tool that removes disk based recovery points based on date. Its interactive session determines protection groups and recovery point date ranges. N.B. The script was originally written by the late, Ruud Baars.

I had mixed success with Baars' script. It worked great on resources that had less than 8,000 recovery points. The script hung indefinetly for protection group members with more than 10,000 recovery points. The situation required extreme measures.

Inactive Protection Group Members

The final option nixes the remaining protection group members that continue to retain expired recovery points. This DPM nuclear option removes all disk based recovery points by deleting their associated volumes. It is imperative to plan for continuity before committing. It's best to ensure the secondary DPM server has backups of the primary protection groups and to make a full tape backup before proceeding.

The afflicted protection group members were transitioned as inactive protection group members. I then attempted to remove the disk based recovery points using the DPM console. Unfortunetly, I had limited success using the GUI. I was able to remove the disk based recovery points from a few of the inactive members. For the majority, however, the console simply froze. At this point, I turned to a second custom PowerShell script, named removeinactivedatasource.ps1. This script was a life saver -it removed all remaining disk based recovery points. I ran the script in verbose mode, so I could see its progress. It took about two hours to complete its job.

I then moved the inactive protection group members back to their original protection groups. N.B., the recovery points must be deleted before re-adding them to their original protection group members; otherwise DPM will continue to use their originally assigned volumes.

The next day recovery points looked great; less than 100 for each member in the DPM console. DPM's PruneshadowcopiesDPM2010.ps1 also ran without problems. I had high hopes that the problem was solved -except that DPM continued to hang after restarting it. Victory was short lived.

Secondary Cause

I had won a battle but not the war. Efforts to fix the recovery point volumes were successful but its cure exposed a secondary sickness: phantom VSS volumes.

I was fortunate to discover a handful of blogs that had somewhat similar DPM problems. Microsoft explains some of the symptoms in KB982210:

This issue occurs because there are a large amount of orphaned registry keys.

The Volume Shadow Copy Service (VSS) snapshots create many registry keys. However, they are not deleted after the VSS snapshot operations are completed.

Scott Forsyth's Blog recommends applying the hotfix from KB982210. The hotfix however, cannot install on a DPM server unless it runs Hyper-V! In fact, most of the focus for this problem centers on Hyper-V backups -but my problem has nothing to do with Hyper-V. Even if I wanted to install Hyper-V, to allow the hotfix installation, the server was in no condition to install a new feature; all updates failed upon restarting the server.

In our case, DPM uses iSCSI disks for the replica and shadow copy volumes. The alternate approach removes the phantom devices via script and then requires a second tool that shrinks the registry. Both Forsyth and Gary Fenton, recommend running the Microsoft tool called DevNodeClean to remove phantom devices from the registry.

DevNodeClean

DevNodeClean is available from Microsoft support or it can be compiled with Visual Studio per KB934234. Fenton also has a complete version available for download on his blog.

I ran DevNodeClean and it indeed found orphaned devices -a grand total of 7. It was less than the 10,000 I had expected. The reason DevNodeClean did not work in this instance is because it only checks for orphaned devices on disks, partitions, and volumes; It does not check for phantom volume shadow copies.

I described the problem to a talented programmer, #SAK, who works at my office. He reviewed DevNodeClean and further developed it so it checks for orphaned VSS volumes.
SAK explained his program lists all orphaned VSS volumes from the command prompt: c:\cleanup.exe.

The program removes can remove all orphaned VSS volumes by including a switch: c:\cleanup -r

N.B., David Candy's Blog has a good alternative to SAK's custom application. The modifed RmHidDev.bat also finds and deletes orphaned VSS shadow volumes.

Tertiary Problem (i.e., third time's a charm):

The crazy slow logons remained; even after all the expired recovery point volumes were deleted; and all the orphaned VSS volume registry keys had been removed. Gambit's blog explains that DPM's problems persist because of its bloated registry. I confirmed the registry size had not changed:

Fig. 4. Bloated registry causes log on profile and update issues.

Microsoft support provides a tool that shrinks the registry, called Chkreg. N.B., Chkregis only available by contacting their support team. Chkreg is also available for download from my OneDrive. The tool is easy to use; the process is somewhat tedious. Essentially, Chkreg cannot fix the system file while the server is operational. The server must be turned off and the disk must be accessed using a separate method.

I shut the server down and used the Windows 2008 installation media to boot into the recovery mode command line. I then used the recovery command to navigate to c:\windows\System32\config, and copied the system file to a separate location. N.B., the drive letters in the recovery command were different from what Windows normally uses. FDISK provides current assignments with its list disk, list partition, and list volume commands.

I removed the Windows CD and re-started the server (and waited an hour). When the server was back up I used the chkreg tool to repair the copy of the registry system. I issued the following commands:

#Chkreg /F SYSTEM /R
#Chkreg /F SYSTEM /C

The new system file was significantly smaller than the original. The system file shrank from 219 MB to approximately 140 MB. I admit, I had hoped the new file size was closer to 10 MB, but at least there was some progress.

Once more, I restarted the DPM server, and accessed the recovery command prompt with the installation media. I moved the original system file to a new location -as a precaution. I then copied the new (i.e., shrunken) system file back to it's original location, c:\windows\system32\config.
I restarted the server and waited for DPM to come back online.

End result -it worked! I can finally log onto the DPM server in less than 30 seconds. Shortly thereafter I installed a year's worth of updates. Everything installed OK and the server remains trouble-free.