Where’s my file? Root cause analysis of FRS and DFSR data deletion

Hi, Ned here. In the Directory Services support space here at Microsoft, we are often contacted by customers for disaster recovery scenarios. We’re also brought in for deeper forensic analysis of what lead to a problem. Today we’re going to talk about a situation that covers both:

A customer has seen some critical data go missing.

That data was replicated via the File Replication Service (FRS) or the Distributed File System Replication (DFSR) Service.

Before they restore the data with their backup copy, they want to have root cause on who deleted what and where it started. We can’t do this after restoring data because our whole audit trail will of course be destroyed within the respective JET databases.

FRS Deletion Forensics – The Where and When

You need to start by determining the name of some folder or file that has been deleted. It’s important that this be exact as we will be using it to search. You will need the full original path since it is possible just the name could be duplicated throughout the content set.

For this example we have three servers called 2003SRV13, 2003SRV16, and 2003SRV17.

We have a folder called c:\frstestlink\importantfolder13 that has been deleted.

It contained a file called c:\frstestlink\importantfolder13\importantfile13.doc which was deleted (naturally). Our folder could contain thousands of files but we just need to know one. That’s easy, someone is screaming at you that it’s missing. 🙂

Install FRSDIAG on any server that participated in the FRS content set where data was deleted.

Open a CMD prompt and navigate to the FRSDIAG directory. This will default to:

c:\program files\windows resource kit\tools\frsdiag

You’ll see that we have a very useful utility called NTFRSUTL.EXE. Running it with /? will show you its options:

ntfrsutl [memory|threads|stage] [computer]= list the service’s memory usagecomputer = talk to the NtFrs service on this machine.

ntfrsutl ds [computer]= list the service’s view of the DScomputer = talk to the NtFrs service on this machine.

ntfrsutl sets [computer]= list the active replica setscomputer = talk to the NtFrs service on this machine.

ntfrsutl version [computer]= list the api and service versionscomputer = talk to the NtFrs service on this machine.

ntfrsutl forcerepl [computer] /r SetName /p DnsName= Force FRS to start a replication cycle ignoring the schedule.= Specify the SetName and DnsName.computer = talk to the NtFrs service on this machine.SetName = Name of the replica set.DnsName = DNS name of the inbound partner to force repl from.

Now we look back at the GUID2Name table we generated earlier, we can see that:

30409f5d-8493-41ad-a98ab03fc1b795e5 = 2003SRV16.

So we know that at Wed Aug 22, 2007 11:57:23 AM on server 2003SRV16, something or someone deleted all this data. Wasn’t that easy? 🙂 If we had object access auditing enabled on that server at the time and the folder configured for auditing, we can even see who did it. More on this later…

DFSR Deletion Forensics – The Where and When

Since DFSR exposes nearly all its interfaces through WMI, we can use a powerful command-line utility called WMIC that can be used to return useful info from the databases. This way we don’t need to rely on add-on tools and debug logs and such. For my example below I am intentionally not using VBScript as I want everyone to understand exactly what it is we’re doing – but feel free to script it up, all the WMI classes are well documented on MSDN.

So here we go again:

We have our three servers 2003SRV13, 2003SRV16, and 2003SRV17.

All are in a Replication Group called ImportantData.

They have a Replicated Folder called… wait for it… ReplicatedFolder.

That folder contains various files and folders, including a folder called ImportantSubFolder. It contains some files, including one called critical.doc. Naturally, someone has deleted critical.doc… let’s figure out where and when.

First we open a CMD prompt as an admin and dump the Replicated Folder info like so:

Now we have enough info to confirm we’re looking at the right data. Let’s get the status of the deleted file by running this command and providing it the file name and the ReplicatedFolderGuid from above:

We can see that it was deleted on Aug 23, 2007 at 23:30:10 (11:30 PM) GMT time. For us this means 7:30PM EDT.

The Flags value of 4 tells us it’s been deleted; examine the chart below. An ordinary replicated file will have a Flags value of 5 (meaning 0x1 && 0x4 for Present and Replicated). If 4 the Present flag has been removed meaning the file is tombstoned, i.e. removed from the replica.

Value

Meaning

PRESENT_FLAG0x1

The resource is not a tombstone; it is available on the computer.

NAME_CONFLICT_FLAG
0x2

The tombstone was generated because of a name conflict. This flag is meaningful only for tombstones.

UID_VISIBLE_FLAG0x4

The ID record has already been sent out to other partners; therefore, other partners are aware of this resource.

JOURNAL_WRAP_FLAG
0x10

The volume has had a journal wrap and the resource has not been checked to determine if there is any change by the journal wrap recovery process.

PENDING_TOMBSTONE_FLAG
0x20

The ID record is in the process of being tombstoned (or deleted.)

The GVsn value is important in that the GUID inside those curly brackets will always contain the unique database GUID of the server where the file was last changed. Since deleting the file counts as a change, now we just need to figure out who owns that GUID. So let’s use the DFSRDIAG command to find the culprit:

Badda-bing! That BAA4E6D9-BF1A-4C83-ADF4-FDFD481AE2FC GUID matches our GVsn above. There’s our guy. What is it with people deleting files and folders off this 2003SRV16 server? We should have a chat with their site admin…

The same exact steps will work for folders. Let’s do it real fast this time and figure out where someone deleted the whole ‘importantsubfolder’:

Note that when a deletion is replicated in to a DFSR server, the file by default is moved to \DfsrPrivate\ConflictAndDeleted under the root of the replicated folder. If the delete was not replicated in, but instead was the result of a local deletion, the file is moved to the Windows Recycle Bin (unless you held down the SHIFT key while deleting, in which case the file is deleted for good). By default the quota on ConflictAndDeleted is 660 MB but that is configurable on the Advanced Tab in the replicated folder properties. In the same location un-checking the “Move deleted files to Conflict And Deleted folder” box will make it so deletions that are replicated in are actually deleted for good without being moved to ConflictAndDeleted.

The information about all the data residing in ConflictAndDeleted is contained in the \DfsrPrivate\ConflictAndDeletedManifest.xml file. When the quota is reached, files are purged from ConflictAndDeleted folder and the ConflictAndDeletedManifest.xml in the order that they were put there. This means you have a limited amount of time to catch a deletion and be able to restore it from ConflictAndDeleted.

There is a sample script for restoring data from ConflictAndDeleted. This is needed because the folder structure of deleted data is flattened and all data resides directory off the root of ConflictAndDeleted, and the filename is appended with the GVSN. The script reads the ConflictAndDeletedManifest.xml so it knows the original file names and folder structure. But you can also determine that using the DfsrConflictInfo WMI class. For example you can check for the presence of a file in a given server’s ConflictAndDeleted folder by running:

The trailing % wildcard is needed because the FileName property has the GVSN of the file appended to it. The ConflictPath property contains the original path and file name for the file before it was deleted.

Auditing – The Who

Now that we’ve covered when and where, the Windows Auditing subsystem can be used to tell you who via the Object Access Auditing setting. The important take away is that you need to have it in place and the audit descriptors set on your files and folders before you need them. Setting it up after the files have gone missing isn’t going to buy you anything. I can tell your eyes have started to glaze over on this post so I’m going to wrap it up here. J

To set up Object Access Auditing you can follow this checklist and set your critical replicated folders to audit EVERYONE for DELETE and process that with inheritance on down. We really don’t usually care about files being changed and certainly not added, but deletions drive end users nuts.

An important thing to understand about Auditing in Windows 2000 and 2003 is that it’s bound by some legacy limitations in the Event Log system (no longer true in Windows Server 2008 or Vista). Basically, you want to keep the total size of all your event logs at around 300MB total or they will become unstable. You’ll find that enabling Object Access Auditing is going to make your Security event logs wrap pretty often, so if it was only 256MB you wouldn’t have much time for forensics. You can get around this by using KB312571 to configure AutoBackupLogFiles and save off the logs when they wrap automagically. Then they can be backed up and deleted periodically with a scheduled task or whatever you like.

Auditing is not free – it costs CPU time, disk I/O, and can increase memory usage within LSASS. Please be sure to test this for your environment. Really.

Tags

A very technical but very informative article on finding out who deleted a file or folder in a FRS or DFS-R environment. Three things I took away from this article #1: Audit information is very important for tracking this kind of activity. It is po

10 years ago

anonymouscommenter

Hi, Ned again. Today I&#8217;d like to talk about troubleshooting DFS Replication (i.e. the DFSR service

10 years ago

Yann

Hello,

Very good stuff! 🙂

I had an issue where the Policies and Scripts folders are morphed as this:

Policies.

Policies_NTFRS_XXXXXX.

Scripts.

Scripts_NTFRS_XXXXX.

These morphed folders appeared at the same time an administrator that did an authoritative restore of an OU.

We wantsto have proofs. Is there a wayto know the "Where, When" those morphed folders appeared, and if possible who did it.

Is there a Tag, in the idtable or elsewhere, that corresponds to a restoration of those morphed folders ?

The steps are pretty much identical as above except that you don’t care about the FLAGS being set to deleted. There’s nothing marking them as morphed bu the name, and you have that piece.

An auth restore of an OU would not be able to cause this issue though – it would have no effect on FRS. But if someone was using GPMC incorrectly (KB929266) or if they set a D4 burflag as part of their steps without setting D2 downstream (Kb315457) it would be very possible that the issue happened at roughly the same time.

10 years ago

Yann

Hello,

Thx for your input. I saw endded the Policies and Scripts morphed folders in the idtable. They refer to a DC.

It seems that if you ticked in the "When restoring replicated data sets, mark the restored data as the primary data for all replicat sets" on the Advanced Retore Options dialog box of the ntbackup Restore tab, this can generate also name-morph conflicts with one set of directory trees having the normal name and the other set having the morphed folders.

Cheers,

Yann

9 years ago

MatM

Hi Ned,

Do you know what flag 6 is when looking at these logs? Is there a table or something I could look at?

A 6 would be name conflict on a previously advertised file. I haven’t really investigated that flag scenario though, but I expect there would be debug logs data showing that file had won a conflict at some point.

– Ned

4 years ago

anonymouscommenter

This is a collection of the top Microsoft Support solutions to the most common issues experienced using

4 years ago

anonymouscommenter

These are the top Microsoft Support solutions to the most common issues experienced using Microsoft Windows

4 years ago

anonymouscommenter

We’ve gathered together the top Microsoft Support solutions for the most common issues experienced

4 years ago

anonymouscommenter

These are the top Microsoft Support solutions for the most common issues experienced when using Windows

4 years ago

anonymouscommenter

These are the top Microsoft Support solutions for the most common issues experienced when using Windows

4 years ago

anonymouscommenter

This is a collection of the top Microsoft Support solutions for the most common issues experienced when

4 years ago

anonymouscommenter

Top Microsoft Support solutions for the most common issues experienced when you use Windows Server 2008

4 years ago

anonymouscommenter

Top Microsoft Support solutions for the most common issues experienced when you use Windows Server 2008

4 years ago

anonymouscommenter

Top Microsoft Support solutions for the most common issues experienced when you use Windows Server 2008

3 years ago

anonymouscommenter

Top Microsoft Support solutions for the most common issues experienced when you use Windows Server 2008

3 years ago

anonymouscommenter

This is a collection of the top Microsoft Support solutions for the most common issues experienced when

3 years ago

anonymouscommenter

This is a collection of the top Microsoft Support solutions for the most common issues experienced when

3 years ago

anonymouscommenter

This is a collection of the top Microsoft Support solutions for the most common issues experienced when