Deduplication

Overview

With deduplication feature Agents don't re-download identical Job data that already exists in the Job folder. When identical data is detected, the Agent copies it locally, saving network traffic effectively for other transfers.

When an Agent is looking for deduplication data during a transfer, it may show in the dashboard that the transfer speed is zero (because no data is being transferred over the network), even though received amount of data continues to grow.

When deduplication works?

Within the boundaries of the same file

Remote Agent's will check to see if they already have pieces of the file they need to receive and will copy piece(s) locally when they are already present.

Remote Agent's will check to see if some pieces of data have been shifted, comparing the file it needs to receive

Within the boundaries of all files known to Agent (including one in archives) - remote Agent will pull file from another job or from archive

When deduplication does not work?

If pieces of 2 different files match (no matter if they are part of same job or different ones)

If data within 1 file is shifted by a significant offset or there are many offsets detected

If the file matches to the hash of another archived file, while the archived file was physically removed

Hashes matter

To ensure that pieces or files "match" the Agent compares hash values of corresponding pieces or files.