Asked by:

Volsnap Event ID: 25 and Loss of Previous Backups

General discussion

I have a Windows Small Business Server 2008 machine being backed up using the Windows Server Backup utility. The backups are being made to two external USB attached 1TB drives. Each drive is rotated out weekly, meaning that one drive is attached one week, then the other drive is swapped in the following week, ad infinitum for the best part of a year. Two backups were being made each day to the attached drive, one at 12pm and one at 9pm.

About two weeks ago it appears that one of the two drives reached capacity and began shedding the oldest backups, as confirmed by the VOLSNAP Event ID: 33 entries in the System Event log. On the second day, however, ALL shadow copies on the attached backup driver (volume) were deleted, as confirmed by a single VOLSNAP Event ID: 25. (See below for both Event ID message contents.) Unfortuneately, neither events were noted until the drive was swapped out with the other drive and some how the problem was propagated to the second drive, resulting in the deletion of all shadow copies on the second drive, too.

The end result... BOTH drives lost ALL previous backups leaving only the most recent full backup and no clue as to why it happened or how to prevent it from happening again. Virtually no older backup file versions could be recovered.

I was under the impression that, by design, the WSB utility would only start deleting the oldest backups (on a full drive) in order to make room for the newer backups. Granted, had I been keeping attention, I would have actually swapped out the full drives before the actually got full, so that NO data was deleted.

If I'm to continue using the Windows Server Backup utility, I need to know what happened and how to absolutely prevent it from happening again. I never expected to loose BOTH backups at the same time, and due to budget restraints, two rotating backup drives seemed a logical solution.

Here is text from the two VOLSNAP Event ID messages found in the System Log:

Event ID: 25Source: volsnapLevel: ErrorDescription: The shadow copies of volume \\?\Volume{93896cfc-d904-11dd-8cb5-001ec9ef572e} were deleted because the shadow copy storage could not grow in time. Consider reducing the IO load on the system or choose a shadow copy storage volume that is not being shadow copied.

NOTE: Though I'm not sure if this has any bearing on the situation, the server is configured to make two snapshots a day of two volumes on the server, however, when I looked at the Shadow Copy settings I couldn't quite be sure if the VSS wasn't somehow also making snapshots on the two external drives, even though I haven't specified VSS to do so. None show up on the two replacement drives I'm now using, so I'd say not.

Information on how to prevent this from happening again would greatly be appreciated. (I'll be making sure no backup drive gets even close to being full from now on, that's for sure!)

All replies

Hello Joseph, The EventId 25 from volsnap deleting all the snapshots is not an expected behaviour. I need to follow up with some other teams to find out why such an event happened. I have not seen this issue earlier.

The windows backup explicitly reclaims space for the current backup by deleting older snapshots OR volsnap will automatically delete older versions in case of Copy On Write (COW) which is exactly what happened with the EventId: 33 and this is an expected behaviour.

Can you please share with us some informations:

1. What is the free/used space of the source volumes? 2. What is the free/used space of the target volumes? 3. The VSS shadow copy space on the target volulme. Use the following command from an elevated window. Assuming D: to be the target volume. vssadmin list shadowstorage /for=D:

Thanks,

Bikash Agrawala [MSFT]

--------------------------------------------------------------------------------- This posting is provided "AS IS" with no warranties, and confers no rights

Hello Joseph, can you please share with the informations as requested above.

Also can you please zip the folder %windir%\logs\WindowsBackup and send it to bikasha-nospam@microsoft.com (Use the highlighted part only) for analyzing the problem. Please keep the subject of the email same as this thread for easy tracking.

Thanks,

Bikash Agrawala [MSFT]

--------------------------------------------------------------------------------- This posting is provided "AS IS" with no warranties, and confers no rights

The two source volumes that are being backed up are the typical Boot Volume (C:) and our Data Volume (G:).

The Boot Volume free and used space is approximately 65GB and 154GB, respectively.The Data Volume free and used space is approximately 183GB and 464GB, repectively.

NOTE: Boot Volume is configured using a hardware RAID 1 configuration. The Data Volume is configured using a hardware RAID 5 configuration.

The two target drives are 931GB drives each and contained backups for both the Boot and Data Volumes. (They were swapped out each week, alternating one for the other drives for alternating weeks.) The last I had checked they had just over 700GB of backed up data, covering about 1 year of backups. I was expecting to swap them out for replacement drives BEFORE they became full and started deleting older backups.

Since the two target drives were used ONLY with the Windows Server Backup utility, and configured by the WSB utility, no other applications or services should have been using the drives. The WSB utility had removed drive letter assignments in the very beginning for these two target drives when they were added to the WSB utility's list of target disks.

NOTE: I'm protecting one of the two original backup drives. The one that I used to run the above VSSADMIN command on has actually been attached (since the problem occurred) to Windows XP machine while I tried to take a closer look at what went wrong. I don't know if that will have an effect on the above VSSADMIN results.

I'll send the requested backup log to the address provided.

Thank you, Bikash. I look forward to finding out what happened, but more importantly, I look forward to finding out how to prevent it from happening again.

I am following up offline with the Joseph and the volsnap team. Since this is a difficult to repro it locally, this analysis may take some time.I will update the thread once I have some useful information to share.Thanks,
Bikash Agrawala [MSFT]
--------------------------------------------------------------------------------- This posting is provided "AS IS" with no warranties, and confers no rights

I have the same problem on a 2 nodes cluster (windows 2003 enterprise R2) that host the file resources. About 20 times since the 24 february 2010, I have the event 25 and I loose the "volume shadow copy history" So I m very interesting to find a solution.

I have the same setup as Joseph C, and I've just encountered the same problem.

I have SBS 2008 being backed up to one of two USB drives (931GB each), which are swapped weekly; a backup occurs every day at 2300. When last night's backup ran I got a volsnap event ID 25, and all previous backups were deleted from the drive. Like Joseph,
I can see that the oldest backups had previously been shed (by seeing event ID 33 over the past couple of weeks). Unlike Joseph, I managed to catch this before swapping the backup drives, so at least I still have half of my backups intact (and I won't be connecting
the full drive now).

I am backing up two drives (system & data) the free/used space for which are 173GB/124GB and 211GB/86GB respectively.

I realise that this thread has been untouched for some time, but has anyone come across a solution to this problem yet?

Backup Programs from other servers are delivering via Network shares their Backups every night. After that has done, the "HP Data Protector Express" (Version 4.00-sp1 Build 43064) make a daily Fullbackup on the Tape Library, LTO4.
The Backup is about 500GB daily, the tape every day have still more than 400GB free space. (800GB space on a Tape)

Last Week, Daily Full Backup Jobs in HP DataProtector Express ended two times with "error 16" and "Failure, Data not found".
( One Day after, the same Job with another Daily Tape runs without Problem, and a Week later, the same failed Job with the same Tape run's again without any Failure........nobody knows why.....??? )

The Event found in Event Viewer was the same:

Source: VolSnap

Category: none

EventID: 25

"The shadow copies of volume D: were deleted because the shadow copy storage could not grow in time. Consider reducing the IO load on the system or choose a shadow copy storage volume that is not being shadow copied."

The shadow copies of volume F: were deleted because the shadow copy storage could not grow in time. Consider reducing the IO load on the system or choose a shadow copy storage volume that is not being shadow copied.

VSS has always been disabled, but i did have a couple of manual snapshots stored. They are now gone.

I am geting same error this on a SQL Server which hosts our sharepoint DB. We dont use Windows shadow copy services directly but I think Networker Legato uses to backup the server. This happend couple of time in past 2 weeks and we have to
recycle the sql server to get it to work . We have tried updated windows update but no luck.

Message
The description for Event ID '-1073348583' in Source 'VolSnap' cannot be found. The local computer may not have the necessary registry information or message DLL files to display the message, or you may not have permission to access them. The following
information is part of the event:'\Device\HarddiskVolumeShadowCopy33', 'D:'

Was there any solution to this issue or key information that was uncovered? I'm having the same issue with a Windows Storage Server 2008 R2 Standard.
I've had volsnap throw up the same error as the OP:

Event ID: 25
Source: volsnap
Level: Error
Description: The shadow copies of volume C:\Data Volumes\Data were deleted because the shadow copy storage could not grow in time. Consider reducing the IO load on the system or choose a shadow copy storage volume that is not being shadow copied.

3 times now we've lost 2 weeks of snapshots because of the purge this event triggers.

If anyone found information on solving this issue they could share I would greatly appreciate it!

We're currently using Backup Exec (but not using any of BE's AOFO technology) to transfer a backup of a SQL database from disk to tape. The backup files are held on a drive of around 14 TB, split into files of around 1 TB or less. The total size of the backup
files is around 12 TB.

Shadow copies are not enabled on the drive holding the backups, but it looks like the process of backing up the files causes the VSS to take a snapshot of the drive in case the files change during the backup.

In the past, we have tried designating a different drive as the storage volume for the shadow copies, but with 12 TB of data it's difficult to find a drive with sufficient space.

I'd like to know if it is somehow possible to disable the VSS snapshot process for this drive? The SQL backup files are static (the actual SQL backup to disk has finished by the time the backup to tape starts), so we know that the files will not change
during the backup. Therefore there is no need for the drive to be snapped before the backup starts. If we could stop this from happening then this would likely solve our problems.

I've run into the same issue with our backup server. We're using BackupExec in conjunction with HP's automated storage manager. Both leverage VSS. We've lost several weeks worth of snapshots several times now. In talking with both HP and Symantec they both
make reference to this error being an expected behavior with VOLSNAP. Their recommendation is to lower I/O load by moving snapshots to a separate disk. We're in the process of doing so, however if this is expected behavior it seems like poor design to have
VOLSNAP blow away all snapshots vs. abandoning the snap in progress during a period of high I/O. Seems like conflicting information as Bikash references

"The EventId 25 from volsnap deleting all the snapshots is not an expected behaviour."

We've noted this most frequently occurs when we are making backups and leveraging the Shadow copy components. Our backups are scheduled during off hours when there should little to no I/O other than the backups themselves. We have been able to manually snapshot
after this event occurs and during work hours complete a full backup with no issues. Even with DFSR in full swing and taking two additional snapshots during that period. So the problem really seems to be when VSS should be trying to remove the oldest snapshot.

I e-mailed Bikash to inquire about any updates (Thanks for the reply Bikash!). June 1st he explained that he was no longer part of this project and but said he'd notify the correct folks. Haven't heard from anyone yet. I'll post if I get a response.

I found that the drive which held our full backups was actually being modified while the backup to tape was running, therefore causing the VSS snapshot to grow in size while it was running.

By excluding all the files and folders except for those actually being backed up, I was able to get a good full backup off to tape, and the actual snapshot size at the end of the backup (as reported by vssadmin list shadowstorage) was minimal, confirming
that the snapshot was no longer growing.

The only problem with this kind of solution is that it effectively makes the exclusions invisible to anything which uses a VSS snapshot, so you may have to keep changing the exclusions depending on what you're backing up.

In any case, this solution is working for us for the time being - longer term we'll be moving everything else off that drive so that we don't need to worry about the snapshot growing significantly during the backup.

The server the backup was running from was on 2008 R2 (not SP1) and we had two USB 1.5TB drive weekly rotated. Only one drive got erased because we were lucky enough to catch the issue in time, we are keeping the second drive unplugged until the issue gets
resolved.

So is there any solution out there? I really love the built in backup but this is MAJOR problem, please advise !

ps: For the record, the event ID i got before the drive got erased are:

Event ID 2013: The T: disk is at or near capacity. You
may need to delete some files.

Event ID 25: The shadow copies of volume T: were deleted
because the shadow copy storage could not grow in time. Consider reducing the IO load on the system or choose a shadow copy storage volume that is not being shadow copied.

What is it used for? The reason shadow copies need to grow during a backup is that files are changing after the initial snapshot was taken. If files are being changed on T: while the backup is running, the snapshot will grow accordingly.

Sorry, to clarify - is T: the source or the destination for the backup?

One thing to look at while the backup is running is 'vssadmin list shadowstorage' which will show you the current shadow copies held on the drive and how big they are. If you see the snapshot on this drive growing consistently during the backup, you'll know
that something on the drive is changing while the backup us running.

Have you explicitly enabled Shadow Copies on T:? Check out the status of shadow copies for T: in Disk Management.

Looks to me like shadow copies are enabled on the drive and the backup itself is causing the growth (since the VSS service is trying to keep a copy of the files as they are replaced by the backup job).

Sorry, I'm not familiar with Windows Backup. A quick Google would suggest that it creates a VSS snapshot on both the source and destination.

Not sure if there's a way of stopping it doing that on the destination, but looking a the size of the shadow copy storage area, that's possibly the reason for your problems. Perhaps the shadow copy is not growing quickly enough due to I/O pressure.

I'm having this same issue as well. It's happening on a file server during nightly backups. We've been consolidating, so there are several drives on this server with varying capacity that all reside on a fibrechannel SAN. For me, this error
seems strongly correlated with drive size. It happens regularly on the biggest of the drives (1950GB), and once a week or so on the next largest drive (793GB). Server is running Win 2003 32bit - latest SP and patch level. There doesn't seem
to be any correlation with free space, as I have another drive (558GB) with far less percentage free space as the 793GB one that holds roughly the same types of files that never produces this error.

I completely agree that Microsoft needs to do something about this. We'll be rebuilding this server soon, and I'm going to have to consider going with a product that supports snapshots reliably - RHEL with btrfs or Solaris with zfs, probably.

bump, i possibly found a solution, I adjusted the shadow copy setting on the external backup drive from NO LIMIT to a limit of 10GB under the max capacity, maybe doing this will avoid the problem, time will tell, Ill try to follow up once i've confirmed
it works or not

^ you need to set a limit on the shadow copy size by going to the drive properties and setting the size limit under the shadow copy, dont set the limit too close to the drive max capacity, give a few hundred MB i'd say, i can't believe Microsoft
didnt document this anywhere....

Had over 100 copies in history. WSB has been, for months, happily managing the deletion of oldest backups to make room on the drive.

Last night received volsnap info event 33, followed by volsnap error event 25
The shadow copies of volume G: were deleted because the shadow copy storage could not grow in time. Consider reducing the IO load on the system or choose a shadow copy storage volume that is not being shadow copied.

Other than WSB, there was no other IO on the target.

I now have a single backup of my VMs. I can't really describe how bad a situation this has but us in . . .

I'm wondering if you are still around. You mentioned you would update this thread when some useful information became available. That was two years ago. Has Microsoft addressed this problem at all? If so, could you please update this thread for the many
users that have come across this issue. I'm encountering the same problem that many have had, and when I find threads such as these, with the possibility of finding an answer, only to find that we are being left to our own demise; it tends to make us lose
confidence in Microsoft.

Please let us know what's going on with this. There are still users/companies that have Windows Server 2003 running.

I have a Windows Small Business Server 2008 machine being backed up using the Windows Server Backup utility. The backups are being made to two external USB attached 1TB drives. Each drive is rotated out weekly, meaning that one drive
is attached one week, then the other drive is swapped in the following week, ad infinitum for the best part of a year. Two backups were being made each day to the attached drive, one at 12pm and one at 9pm.

About two weeks ago it appears that one of the two drives reached capacity and began shedding the oldest backups, as confirmed by the VOLSNAP Event ID: 33 entries in the System Event log. On the second day, however, ALL shadow copies on the attached
backup driver (volume) were deleted, as confirmed by a single VOLSNAP Event ID: 25. (See below for both Event ID message contents.) Unfortuneately, neither events were noted until the drive was swapped out with the other drive and
some how the problem was propagated to the second drive, resulting in the deletion of all shadow copies on the second drive, too.

The end result... BOTH drives lost ALL previous backups leaving only the most recent full backup and no clue as to why it happened or how to prevent it from happening again. Virtually no older backup file versions could be recovered.

I was under the impression that, by design, the WSB utility would only start deleting the oldest backups (on a full drive) in order to make room for the newer backups. Granted, had I been keeping attention, I would have actually swapped out the full drives
before the actually got full, so that NO data was deleted.

If I'm to continue using the Windows Server Backup utility, I need to know what happened and how to absolutely prevent it from happening again. I never expected to loose BOTH backups at the same time, and due to budget restraints, two rotating backup
drives seemed a logical solution.

Here is text from the two VOLSNAP Event ID messages found in the System Log:

Event ID: 25
Source: volsnap
Level: Error
Description: The shadow copies of volume \\?\Volume{93896cfc-d904-11dd-8cb5-001ec9ef572e} were deleted because the shadow copy storage could not grow in time. Consider reducing the IO load on the system or choose a shadow copy storage volume
that is not being shadow copied.

NOTE: Though I'm not sure if this has any bearing on the situation, the server is configured to make two snapshots a day of two volumes on the server, however, when I looked at the Shadow Copy settings I couldn't quite be sure if the VSS wasn't somehow
also making snapshots on the two external drives, even though I haven't specified VSS to do so. None show up on the two replacement drives I'm now using, so I'd say not.

Information on how to prevent this from happening again would greatly be appreciated. (I'll be making sure no backup drive gets even close to being full from now on, that's for sure!)

Thanks,

Joseph.

This question was originally asked over 2 years ago. Is there any answer that Microsoft would be willing to provide for those of us that are struggling to find the answers?

During normal Windows 2008 R2 file server server operation (writing files to the NTFS volume) Event 25 appears sporadicly and all snapshots are lost.
There is only the "Microsoft Software Shodow Copy provioder 1.0" active, the System ist only used as file server for backups, ist has a 40 TByte RAID Volume with a sequential read/write performace of 950 GByte per Second. Cluster size of the NTFS volume is
16k and the MinDiffAreaFileSize was set to the maximum recommended valzue of 3000 MByte.

Unfortunately, Microsoft has no intention to fix that problem, because Microsoft does not want to enter into competition with vendors of professional storage systems. After some time Microsoft will tell the customers, that the product ist out of
mainstream support. Probably also applies to Windows Server 2012.

Thanks for posting this response. I wish Microsoft support could keep up on these threads.

I've been trying to resolve this issue, and I'd followed one set of fixes that told me to edit the registry value to accommodate a larger VSS size, now I'm getting an event error of 20 telling me that the COM+ database may be corrupt. The other possibility
is that the setup may have not installed properly. I wasn't here when it was installed, so I can't even verify it's success.

Good luck to all that have been having these problems. I'm still trying to figure them out on my end when normal operations allow.

During normal Windows 2008 R2 file server server operation (writing files to the NTFS volume) Event 25 appears sporadicly and all snapshots are lost.
There is only the "Microsoft Software Shodow Copy provioder 1.0" active, the System ist only used as file server for backups, ist has a 40 TByte RAID Volume with a sequential read/write performace of 950 GByte per Second. Cluster size of the NTFS
volume is 16k and the MinDiffAreaFileSize was set to the maximum recommended valzue of 3000 MByte.

Unfortunately, Microsoft has no intention to fix that problem, because Microsoft does not want to enter into competition with vendors of professional storage systems. After some time Microsoft will tell the customers, that the product ist out of
mainstream support. Probably also applies to Windows Server 2012.

I have a similar problem on a Windows 7 laptop system. VSS has already corrupted my HDD a couple of times costing me untold hours (actually days) of work in restoring backups, recovering data and reinstalling applications. I already tried increasing
the VSS storage (from 10GB to 48GB) and increasing MinDiffAreaFileSize to the recommended 5% of restore point storage allocated. This seemed to help for a couple of days and then the problem recurred. I had high hopes of fixing the problem by reformatting
the main partition to use 16K, as suggested in
this KB article, but your experience above has dashed my hopes. In fact, I'm not even sure that the above-mentioned KB is relevant.

This appears to be a fairly common problem, going back for MORE THAN 6 YEARS!
This KB article refers to a
hotfix for Server 2003, but if the problem was already fixed in 2003 SP1, why is it still occurring in later OSs like Windows 7 and Server 2008? Has it been fixed in any of the newer OSs, e.g. Windows 8?

Some articles blame 3d party VSS writers for this behaviour, and suggest:

vssadmin list writers

to identify them. On my system, this yields a list of 11, but no clue as to which are (not) from Microsoft.

How can I make this distinction?

What good is a backup service, which is not only unreliable, but even corrupts HD storage?

The easiest solution would seem to be to turn off VSS, but when I look at the services on my laptop, it isn't running anyway?! How can this be?

This KB article suggests using a different HDD (not physically possible with my laptop) or at least another volume for either the VSS storage or the pagefile. Can anyone suggest which of these options
would be better and why?

Is there some other workaround for this?

P.S. To answer my own question about whether to move VSS storage or pagefile, MS has made the decision for me, by crippling vssadmin under Windows 7. See
this article.

I have a Windows Server 2008 R2 SP1 on Dell Server PE R210 with two 2TB drives using RAID 1 on an SAS 6/iR Adapter. The drives are about half full. It is used as a file server for a 50 person network and also has two hyper-v's. One
hyper is a Barracuda Spam and Virus Filter and the other is another 2008 R2 server running one specific application. The server is stable, except that from time to time I do have trouble with opening the WSB console. It will work fine for months then all of
a sudden take hours. Disk Management also will run into similar problems around the same time. I attributed this to possibly problems with the USB backup drive or corruption in the WSB catalogs. One time deleting and restoring the catalogs from the backup
disk fixed the WSB console issues. Recently I have been having problems opening WSB and disk management again. I rebooted the server and I noticed that WSB opened quickly but my backups were not accessible anymore from Recover..., only the one it did previously.
I tried the restore catalog function but that didn't work. I dug deeper and ran vssadmin list shadows and there was only one entry. There were about 100 copies\backups previously. I noticed the event 25 volsnap and wound up here searching for an
answer. I expect to be able to go back to any point in time in the last few months and be able to pull up a missing file. WSB sure gives the impression it should do that. Inexplicably, all the backup copies appear to be gone. I understand overwiting the
oldest copy to make room, which I am familiar with. In fact I was getting ready to replace this drive with another as it was almost full. It had about 150 GB left on a 2TB USB drive. It still says 150GB free, even though the shadow copies appear to be gone.
The vhd from the backup is around 800GB. This leads me to believe that they might still be there since it didn't seem to free up space, or at least not in proportion to what it should have if it deleted all the copies. I should be looking at a TB or so free,
if the shadow copies were truly deleted from the disk. Are they corrupted, so they appear missing, but still take up space? I don't know. Any help would be appreciated. Definetly going to be moving away from WSB after this. I had my issues with it before,
but was always able to make it work. This is just inexcusable and I'm very pissed as a loyal Microsoft customer that their product takes liberty to just delete files when things get a little hairy.

I think this problem is related to memory consumption problems. When you are using slow drive (such as USB one) Windows cashes disk writes using system RAM. The root of the problem is: there is
no default limit of memory Windows x64 can use as cache. So every time Windows writes large data to relatively slow drive, you may see something like

Available memory is almost exhausted.

The system file cache consumes most of the physical RAM.

So when you write your backup data, and the VSS needs some more memory and asks system for more, it may get an error, and delete all the snapshots.

I was able to find a way to make error 25 and 27 happen at will. I created a dedicated dump file instead of using the pagefile.sys for system dump files. You create this dedicated dump file by adding entries to HKLM\SYSTEM\CurrentControlSet\Control\CrashControl.
I did this so that I can run without a pagefile on my SSD and still be able to acquire dump files in case of a system crash. I moved the dedicated dump file to d:\ and my boot time increased by 3 minutes and all volsnaps on d: were removed by an
error 25 followed by 2 volsnap error 27s. I moved the dedicated dump file to drive e:\ and the next reboot did the same to drive e:. An extra 3 minutes to boot and an error 25 followed by 2 error 27s. When I moved the dedicated dump file
to e: I reduced the amount of disk space for the file from system controlled (17GB with 16GB of RAM) to 7GB. The volsnaps were still deleted due to error 25/27. It appears that the volsnaps may be deleted due to large pagefile.sys or DedicatedDumpFile.sys
activity on the disk making the volsnap grow in size during boot. I have .5GB free on d: and 1TB free on e:. I would not expect vss to be running during boot, but apparently it is. I am far from being an expert on VSS. Let me know
if this helps anyone to at least duplicate the problem or possibly help resolve it. I am not going to try this on my c: drive.

The volsnap error 25 occurs immediately during boot. Then there is no activity until the volsnap error 27s about 2 1/2 minutes later. No activity in any other Windows Event logs either. Disk activity is constant during this 2 and 1/2 minutes
even if there is only 1 volsnap stored on the drive.

Backing up the C: System and D: Data drive to an internal 3 TB SATA - HD, there were appr. 1.4 TB used on
the backup drive. It works great for over 10 weeks now. There were over 50 backups available when I controlled last time. Today only 2 backups are there, since 2 nights ago there was the volsnap error 25.

On SBS2008, as well as on WS2012 Essential, Previous existing backup are completely erased from external drive and I found a volsnap EventID 25 in the System log...

So, it seems that since 11 Feb 2010 (date of the 1st message in this huge thread), Microsoft provides NO resolution for this crucial problem : Deleting ALL existing backup on a drive due to an unkown and unpredictable VSS behavior ??? That's just
incredible !!!

Is there someone from MS support team that read and continue to update this thread?

They were still visible in the Device Manager with env setting devmgr_show_nonpresent_devices=1 and the Show Hidden Devices option on. They were greyed out, so I had to delete these by hand.

Anyway, I think the description of the event is correct. Too much disk IO at startup is the cause for this problem. I my case it was HKLM\SYSTEM\CurrentControlSet\Control\CrashControl "DedicatedDumpFile"="C:\\CrashDumpFile.dmp"
This creates a dump file at boot time with the size of your RAM. Very Disk IO intensive!
After I renamed value DedicatedDumpFile to DedicatedDumpFile.OFF VSS copies survived a reboot!

For redundancy and system state backup, I've configured Windows Backup to "backup to a shared network folder". It's just a mapped partition on an external drive. This forces Windows backup to only make full backups with no shadow copies
on the target. I keep 2 copies by scheduling this bat file to run before the backup. You can change the script if you want to keep more than 2 copies - example at end of post:

------------------------

Script to Keep 2 copies:

W:
rd W:\Backup2 /s /q
Ren W:\WindowsImageBackup Backup2

--------------------------

Notes:

Windows Server Native Backups
2 Copies Total
WindowsImageBackup is the most recent
Backup2 is the previous day
Rename Backup2 to WindowsImageBackup to restore the older backup