Everything You Need to Know About Exchange Backups* – Part 2

* But Were Afraid to Ask

Part 2 of this series (Part 1 is here and Part 3 is here) breaks down the events that take place during the backup of a mounted and active replicated database in an Exchange 2010 Database Availability Group called, simply enough, “DAG”. In this example the backup server is asked to create a full backup of database DB1 on server ADA-MBX1, using non-persistent COW snapshots:

(please click thumbnails for full size version of graphics in this post)

Event 9606 indicates that the VSS requestor has engaged the Exchange writer, and reports the instance GUID for the backup job that is starting. In this case the instance is 830705de-32d9-4059-94ea-b9e9aad38615. This instance GUID persists throughout each job, and changes with each subsequent one. You can therefore use it to track the sequence of events for each individual job. At this time the Exchange Writer provides metadata about the databases and logs present to the backup application.

Events 2005 and 9811 indicate an instance number assignment for ESE. So along with the writer instance GUID from event 9606 we can also track a job’s progress using these ESE instance numbers which increment by one with each job. At this stage the database is marked with “backup in progress” in the Information Store Service’s memory space.

Just after the backup application has determined which disks need snapshots created, based on the data locations provided by the Exchange Writer metadata, it goes ahead and requests those snapshots. As the snapshot requests arrive event 9608 gets generated, indicating the Exchange writer’s acknowledgment of what’s about to happen. It then must halt disk writes to the database(s) and logs, also known as a “freeze” for the duration of the snapshot generation process.

When event 2001 is generated the current transaction log is closed, and the freeze begins. Writes from STORE.exe to the disks are held in memory.

Once these events appear we know the snapshot(s) have been created, and writes are allowed to database data blocks again.

Once the snapshots are created the backup application can copy blocks of data from the VSS subsystem, getting blocks of data from shadow storage if they’ve been preserved due to a change, or from the actual disk volume if they haven’t. The Exchange Writer waits for the signal that the transfer of data is complete. This flow of data is represented by the purple arrows, which in this case indicates data getting copied out of the snapshots in storage, through I/O of the Exchange server, and on to the backup server.

Once the backup application finishes copying data it will signal VSS that it’s done. VSS in turn signals the Exchange writer, which then initiates post-backup steps, signified by the above events. Event 225 appears to state that log truncation won’t occur, but that event is misleading. For a standalone database, upon backup completion, ESE would go ahead and clear logs accordingly. However, when a DAG replicated database is involved a check of other database copies must be performed in coordination with the Exchange Replication Service to ensure log truncation can continue. Once that check is complete the logs eligible for truncation are deleted. The database header is marked with information about the backup and the backup in progress bit is switched off in memory. In this case the snapshots used for the job are destroyed as part of the completion. In other types of backups, such as incremental, the persistence of the snapshot varies, but in this case they are removed.

In the next post in this series we’ll break down the backup of a passive DAG replicated database copy.

@Michael, the freeze duration is discussed briefly in part 1, but how that generally works is like this. The Exchange Writer freezes I/O for a database and notifies VSS of the freeze. VSS then has 60 seconds to get a snapshot of the disk(s) where the data
resides before a timeout is issued. If the snapshots aren’t ready in that time an abort is issued, and the Exchange Writer thaws the database. Usually the snapshots are prepared very quickly, so the database is not frozen for even close to the full time
allowed.

As for your question about identifying specific mailbox databases for backup, it depends on the requestor application you’re using. Diskshadow can request snapshots, but it won’t pull data out of them once they’re created, something else needs to do that.
That being said, to single out one database out of many you can use a "writer verify" statement with just the GUID of that database like this:

Your script also needs an "add volume" statement for each volume with database data you need snapshots of:

add volume <DatabaseDiskVolume> alias <DatabaseDiskVolumeAlias>

Format the rest of the script accordingly and Diskshadow will only snapshot the disk(s) for the database you want. You can then robocopy out the data if you expose the snapshot(s). Here’s our doc on setting up the complete script, there are many 3rd party
ones available as well: