Are there any plans to make Veeam Agents smarter with respect to clusters and replicated data?

At this stage (Veeam 9.5 U4a), a fail-over cluster with physical RDMs needs to be backed up with an agent, as vSphere cannot snapshot the physical RDMs. The agent is loaded on each node, and the Change Block Tracker assesses the node with a view that all data is local. If a fail-over event occurs, and the RDMs are removed from the primary node and mounted against the secondary node, the agent on the secondary node sees the relocated RDMs as new data and tries to back it all up from scratch, blowing out the backup time and the repository size and duplicating all of the data within Veeam. Instead of doing an incremental backup in a couple of hours, our backup blows out to 4 days, and fills the repository with new data, requiring manual intervention to prevent catastrophic failure.

The agent should be able to recognise the relocated data as already existing in Veeam, then do incremental backups only, but it doesn't.

Ironically, Update 4b is Storage Replica aware, and will only backup data once when it is actually replicated, but the Failover cluster data that exists only once gets replicated by Veeam.

I've seen various similar topics around regarding Microsoft Failover Clusters, AAG and DAGs, but nothing that quite covers this.

When you say "that's not normal", are you implying that the Agent should only create one copy of data, regardless of which Failover Cluster node the data is hosted on? We were advised by our vendor that having to do a full backup as a baseline each time the failover cluster changed nodes was entirely normal, so we never logged a job. We had been assured that this was normal behaviour.

The reference to Storage Replica is admittedly a little off-topic here: Storage Replica doesn't use RDMs, so it can be snapshotted, and therefore backed up agentless, so it's not really a discussion for this thread, as it won't use the Veeam agent. But the release notes state "support for Windows Server Storage Replica, including automatic exclusion of duplicate copies of data at backup time". so it seems that Veeam can identify data duplication in Storage Replica clusters, but not when using the Agent in Failover clusters.

Sorry, yes: the failover reference was talking about a Microsoft Failover Cluster fail-over, not the ESXi hosts. So if the MS Failover cluster moves the disk resources from the Windows Primary Node to the Windows Secondary Node, the Agent sees that as new data and tries to back it up from scratch.

This is problematic for two reasons: 1. It takes 4 days to do a full backup of 64TB across the wire via Agent, and 2. the backup repository has a maximum space of 120TB, so we don't have the space for two separate copies in the repository.

When you say "that's not normal", are you implying that the Agent should only create one copy of data, regardless of which Failover Cluster node the data is hosted on?

yes. That's what I would expect from any backup solution. But it's a good point to add that to my upcoming blog post. I did the following test: failover from node 1 to node 2. The incremental backup was 350 MByte. Full backup was 13 GByte. After full shutdown of the cluster, the incremental was 3 GByte. There is no full backup if everything works as expected.

We were advised by our vendor that having to do a full backup as a baseline each time the failover cluster changed nodes was entirely normal, so we never logged a job.

So you would expect Veeam to keep only a single copy of all cluster data, and for the Failover cluster to keep doing incremental backups, regardless of which Microsoft server node the data is hosted on?

If this is the expected behaviour, why is our Failover Cluster not behaving as expected, and instead attempting a Full backup every time the Microsoft Cluster fails over nodes? Should I log a Support Call and log this as a fault?

Incidentally, the Vendor in question is a Veeam partner, as recommended to us by Veeam for the Veeam implementation.

There should be full details in that job, but we're seeing some really odd behaviour in these backups. Like the Backup Repository reporting that it's backing up 158TB of Fileshare cluster, even thought the cluster has only 54TB used with a total capacity of 80TB. It then stops and does a full in the middle of the week, for no apparent reason, but doesn't mark it as a Full, just an incremental. We have a SAN limitation of 120TB for a volume, so we are running out of space regularly if it tries to do a full backup on top of an existing full backup, because we just can't fit multiple full backups in the same repository. And every attempt it makes blows our backup out for 3 to 4 days.

Like I said: more details in the case, but let me know if you want any clarification.