If the CBT Information would be reset automatically, that would/should force an Active Full Backup, because no CBT Information is available, right?
I assume that an Active Full can be the only workaround (if you assume that you have corrupt incrementals) as it reads the Disk from the beginning to the end...

I'd say it wouldn't create an active full, it only reads the whole disk and then compares the contents with the latest backup to only write the increments. In the end, you would still have a vib but without using the faster way of fetching the delta blocks via CBT.

I'm wondering how this "bug" affects replication... I mean if you start a (powered off) replica, the next replication pass would revert the snapshot to make sure the the last replication pass is in place (even if you hadn't done anything). Now does Veeam use CBT for the target VM? I assume that it doesn't and only uses CBT for the source VM - can anyone confirm that?

I'd say it wouldn't create an active full, it only reads the whole disk and then compares the contents with the latest backup to only write the increments. In the end, you would still have a vib but without using the faster way of fetching the delta blocks via CBT.

From the digest we know that Veeam gets invalid CBT Data but does Veeam has a logic to do a full read on the disk for the comparison because the CBT Data is invalid? I mean from Veeam side such a logic now is needed to be ensure Backups are ok (without using Surebackup).

And in the VMware KB it is stating that the CBT API might not even throw a error. You could run into this problem without noticing it.

Also if i restore such a corrupt Backup, how will this affect the restored VM? Will it boot and everything looks fine but under the hood it isn't? Will Surebackup see such a corrupt Backup?

Well if you reset CBT or disable CBT usage in the backup job settings, veeam would read the whole disk to check "manually" which blocks have changed - so there is already this functionality. From my personal experience I can tell you that a corruption caused by invalid CBT data may not be noticed even if you run a SureBackup job. It might be that the VM is booting fine and just has a corruption within some files which are not part of the OS / any other process whatever. Even if you let veeam check all the data blocks wouldn't help because the blocks within the backup file are fine - the corruption has been caused by e.g. not all changed blocks have been fetched and therefore written to the backup file.

So all in all, to be sure that you're backup up correct data: Reset CBT or disable the usage of it in the job settings

Correction: If CBT is corrupt then a CBT reset as described here https://kb.vmware.com/s/article/2139574 is the workaround. It will trigger within Veeam a snap and scan backup which read 100% of the data and then work accrodingly to the selected backup method to store and active/syntetic full or incremental but you have to trigger an active full to reset the CBT within the backup chain to not depend on older restore points.

@Andreas: Why would you need to do an active full if CBT was corrupted? Your current restore points may have corrupted data but if you start to not use CBT, the next (incremental) pass would then be fine because veeam would then start to backup incrementals as it should. If veeam hasn't backed up some blocks because of the invalid CBT result, then it would detect the changes during the read of the whole disk. That means after that run, data would be consistent - even without active full.

Good day everyone,
concerned with this newly discovered issue, i started searching for the tool to reset cbt i had been using some years ago. I have found it but i am no longer able to make it run. It was called "CBT reset tool" (had this name in ascii art within the code) and it was nice because it allowed me to specify which VM i wanted to had CBT reset.

Just to make sure I am reading and following correctly in my own head, let me ask a few questions.

The bug in question has to do with reverting snapshots while CBT is enabled correct?

I manage backups for an MSP with several engineers that may or may not be aware of this (even if I tell them about it). So let's say one of our engineers takes a snapshot outside of Veeam and then reverts it.

Several of our environments won't really be able to handle turning CBT off on all of the backup jobs. It'd be too much strain on their network and/or disks.

So to fix this if I find out it happened (I am made aware of all snapshot activity by engineers per company policy), what option do I do to fix this?

Do I take an active full? If I read correctly this won't reset CBT, right?

Do I reset CBT and THEN take an active full?

Do I just have to disable CBT backups for this one VM now?

How about prevention? I can post an internal KB to our company for taking and reverting snapshots if necessary. I'm already posting one now to just not and to rely on the backups themselves, but in the case that we have to (hopefully not), I'd like to have a plan.

To prevent this, could I just have my engineers disable the job, disable CBT at the VMware level, then take a snapshot? Veeam should turn CBT back on afterwards right? But at that point I would think it'd be reset.

Gostev wrote:This just in. We've been troubleshooting one backup corruption issue seen internally in one of our labs, where all signs pointed to a possible VMware changed block tracking (CBT) bug. Eventually, this was tracked down to a revert snapshot operation on the protected VM, following which CBT API started to return invalid data. So we've opened a support case with VMware Support, and after 2 months their conclusion was that this corruption is "by design" and is due to the fact that CBT API does not support reverting snapshot on a VM. They even published the official support KB article about this. I'm still trying to wrap my head around their response, but my first reaction is that it makes little sense? I would argue ESXi should then simply reset CBT on a VM following snapshot revert operation, or even just start returning an error – instead of providing invalid CBT information, as if nothing happened? So this week, we'll be escalating this issue through our VMware Alliance channel as the next step. Normally, I would wait until we get another opinion there, however I had to share what we know so far immediately - since this issue leads to backup corruption.

Bottom line: don't revert VM snapshots, and better yet - don't use VM snapshots in production environments at all. They impact performance, they overfill datastores - and now this. Instead, just use Quick Backup (or VeeamZIP) to create out-of-band restore points as needed - and do a full VM restore if you need to rollback. Do keep in mind that Veeam can use CBT for restores as well, which makes VM rollback blazing fast even for biggest VMs.

sigh, the use of snapshots is daily practice in our organisation. Before installing updates or doing upgrades of installed software we take a snapshot and install the update. If all is working fine, we delete the snapshot. If there's an issue, we revert back to the snapshot and start over. Sometimes we restore the VM from backup if we forgot to take a snapshot. I think this use of snapshots is not uncommon, so I find it a bit strange that this CBT issue comes up now only. Shouldn't it have been detected way earlier? Also, this would mean that a VM that was restored from backup might have been corrupted, but it's possible it isn't noticed yet since no critical files were affected.

This really seems a huge bug in vmware and should get way more attention. It also means that when using forward incremental backups some of them might be useless since they might contain corrupt VMs...

It also means that when using forward incremental backups some of them might be useless since they might contain corrupt VMs...

Unfortunately, as with a number of previous VMware CBT issues, Active Fulls are also affected by this bug from the moment when this CBT issue is triggered. In other words, using forever incremental does not introduce additional risks here comparing to backup modes with periodic Active Fulls.

However, despite what VMware KB article says, we (Veeam) actually believe that the scope of the issue could be a bit smaller. According to our own testing, simply reverting VM snapshot does not break CBT - one additional action is also required. We kept stressing these findings with VMware Support, but they largely ignored our observations, and as you can see they also didn't include any other variables in the official KB article. So at this time, as a partner we must stick to their official conclusion.

We are getting another opinion from our friendly alliance folks at VMware though. It may take longer than usual due to VMworld U.S. happening right this moment, but I will surely keep everyone here posted on any significant updates.

Well, if I could - I would do that right in the previous post, and even in the digest, trust me. However, I decided I'd rather not, because we did stress these findings with VMware Support - so at this point, it may come across as me publicly arguing with the official VMware position documented in the KB article, which clearly explains that reverting snapshots is not supported by CBT in principle. And by doing that, potentially endangering our joint customers due to providing wrong info. From my previous experience with VMware, this kind of stuff can get very political real quick. In the end, no ones wants to have their partner telling a different story... and I can relate to that myself, because we do have such issues with our own partners occasionally!

Instead, just use Quick Backup (or VeeamZIP) to create out-of-band restore points as needed - and do a full VM restore if you need to rollback. Do keep in mind that Veeam can use CBT for restores as well, which makes VM rollback blazing fast even for biggest VMs.

The only drawback using full VM restore + CBT is that you can only use it once! I had a very unlucky situation 2 years ago when I had to investigate a very severe database issue. I did a quick rollback using veeam's full vm restore and started the investigation. After a while I noticed that I was on the wrong path and wannted to revert again (using the same method) but got very surprised by the fact that veeam didn't do the delta restore but instead it wrote the whole disk (500 GB) which took a while because of a bottleneck. I didn't understand the situation and after talking to a veeam person I noticed that CBT only works when the vm is powered on - which isn't the case when veeam does a full vm restore. So that means if you need it once (the restore) than it's perfect, if you probably have to restore several times then it could be painful. In such a case, a snapshot would be better but of course now it's a bad idea since we know about this possible corruption.

Just wannted to share this information in case somebody else will get into a similar situation.

The notes in the VMware KB suggest this issue only occurs if you enable CBT on a virtual machine which already has existing snapshots or am I missing something? Despite the KB saying that CBT does not support the revert snapshot operation.

Note: Ensure that there are no snapshots on the virtual machine before enabling change tracking. If you create snapshots before enabling CBT, the QueryChangedDiskAreas API might not return any error or the data returned by QueryChangedDiskAreas might be incorrect.

Just after a bit of information from Veeam if possible regarding the announcement of CBT issues with reverting snapshots. SureReplica reverts the snapshot once it has finished testing doesn't it? So therefore if we used a replica as a source for a backup, the CBT would be invalid wouldn't it? Just making sure we structure our backups around this unfortunate announcement.

The notes in the VMware KB suggest this issue only occurs if you enable CBT on a virtual machine which already has existing snapshots or am I missing something? Despite the KB saying that CBT does not support the revert snapshot operation.

Note: Ensure that there are no snapshots on the virtual machine before enabling change tracking. If you create snapshots before enabling CBT, the QueryChangedDiskAreas API might not return any error or the data returned by QueryChangedDiskAreas might be incorrect.

No, this is just general/unrelated note that is separate from the discussed issue. But in any case, as you can see from the corresponding Veeam UI label, we do perform this check before enabling CBT on a VM. This was in place since we first released CBT support in v4.

One of Veeam support folks who has been around forever suddenly recalled that we already been through this very topic with VMware back 8 years ago! There was a CBT bug with reverting VM snapshots back then, which was fixed by VMware - and moreover, in exactly the way I suggested it should be fixed (by returning an error, which in turn forces Veeam to perform an incremental backup using the entire image scan approach, and establish the new reference point for CBT to use). Which in turn means that reverting snapshots IS actually correctly handled by CBT API, contrary to the most recent KB article? And perhaps there are other variables indeed?

Symptoms wrote:Reverting a snapshot for a virtual machine that has Changed Block Tracking (CBT) enabled to a snapshot older than its last incremental backup can cause inconsistencies in incremental backups of that virtual machine.

Resolution wrote:This issue is resolved in vSphere 4.1 and vSphere 4.0 Update 3. Rather than potentially providing incomplete data, a change ID obtained before the snapshot revert is now correctly considered as being invalid.

This whole thing sounds fishy to me. I would think in most large environments. snapshot reverts are BAU activities. They teach you this way of working on the basic courses with no warning of "reverting may corrupt your backups"

There's no info in the KB about the different snapshot formats, VMFSsparse or SEsparse - previous bugs(features) sometimes only relate to one or the other.

I think this is a too big deal to be a real thing after all these years...

the following note has been added in the last 2 days, without changing the "last updated" field?

"Note: Ensure that there are no snapshots on the virtual machine before enabling change tracking. If you create snapshots before enabling CBT, the QueryChangedDiskAreas API might not return any error or the data returned by QueryChangedDiskAreas might be incorrect."