In which scenarios can duplicate emails occur in GFI Archiver?

Author:
Luis Fernandes
November 30, 2018 04:37

Answer

GFI Archiver does normally store only a single copy of an email, even if one tries to archive the same email after it has been archived initially. The methods used for Single Instance Storage (SIS) were improved with the release of version 2012 in comparison to older versions.

For a better understanding the list below outlines a regular scenario in which SIS prevents a duplicate from being created:

SenderZ sends an email to RecipientA and RecipientB

Microsoft Exchange creates a copy of this email in the journal mailbox

GFI Archiver downloads the email from the journal mailbox

GFI Archiver stores the email once into an Archive Store and assigns ownership to SenderZ, RecipientA and RecipientB (at this point, a hash value which was generated based on certain data of the email is also stored in the Archive Store - this is the SIS hash value)

Let's assume the administrator uses the Import Export Tool and imports the very same email from the Exchange mailbox of RecipientA into GFI Archiver

GFI Archiver (more precisely, the GFI Archiver Store service) will calculate the SIS hash for the copy of the mail which is currently being processed

GFI Archiver will query the Archive Store to see if an email with the same SIS hash already exists

GFI Archiver will find the SIS hash value resp. the email in the Archive Store, it will not store the email a second time and prevent a duplicate from being created

The following scenarios describe situations which still could lead to such duplicates when using current versions of the product:

a) If an email is processed a second time which was sent to a distribution list (DL) and the members of the DL did change since the email was archived originally

Scenario outline:

DistributionListX contains 2 members: RecipientA and RecipientB

SenderZ sends an email to DistributionListX

Microsoft Exchange creates a copy of this email in the journal mailbox

GFI Archiver downloads the email from the journal mailbox

GFI Archiver stores the email once into an Archive Store and assigns ownership to SenderZ, RecipientA and RecipientB (at this point, a hash value which was generated based on certain data of the email is also stored in the Archive Store - this is the SIS hash value)

The members of DistributionListX changes: RecipientC is added

Let's assume the administrator uses the Import Export Tool and imports the very same email from the Exchange mailbox of RecipientA into GFI Archiver

GFI Archiver (more precisely, the GFI Archiver Store service) will calculate the SIS hash for the copy of the mail which is currently being processed

GFI Archiver will query the Archive Store to see if an email with the same SIS hash already exists

GFI Archiver will not find find the same SIS hash value resp. the email as the recipients (based on the DistributionListX members in this case) is part of the SIS hash calculation

GFI Archiver stores the email as a duplicate into an Archive Store and assigns ownership to SenderZ, RecipientA, RecipientB and RecipientC

b) If an email is processed a second time which was edited - which causes the SIS hash value to differ (note that a user can easily edit an email in their mailbox using Outlook)

This scenario is very similar to a) as is causes the SIS hash value to differ when comparing the SIS hash of the original email and the edited one.

c) If an email is processed a second time, but the Archive Store which contains the original email is configured with the option "Read-only access" or "Do not archive more emails in this Archive Store"

Scenario outline:

SenderZ sends an email to RecipientA and RecipientB

Microsoft Exchange creates a copy of this email in the journal mailbox

GFI Archiver downloads the email from the journal mailbox

GFI Archiver stores the email once into the Archive Store [2014 Jan] and assigns ownership to SenderZ, RecipientA and RecipientB (at this point, a hash value which was generated based on certain data of the email is also stored in the Archive Store - this is the SIS hash value)

Later in the year, the administrator enables the option "Do not archive more emails in this Archive Store" for the Archive Store [2014 Jan]

Let's assume the administrator uses the Import Export Tool and imports the very same email from a PST file choosing ImportUserC as the owner

GFI Archiver (more precisely, the GFI Archiver Store service) will calculate the SIS hash for the copy of the mail which is currently being processed

GFI Archiver will query the Archive Store to see if an email with the same SIS hash already exists

GFI Archiver will find the SIS hash value resp. the email in the Archive Store [2014 Jan], but I cannot assign the additional owner ImportUserC based on the [2014 Jan] having the option "Do not archive more emails in this Archive Store" enabled

GFI Archiver will create a new Archive Store [2014 Jan 2] storing the email into it and assigns ownership to SenderZ, RecipientA, RecipientB and ImportUserC (effetively creating a duplicate across multiple Archive Stores)

Notes

Above are known limitations. At the moment GFI Software does not pursue to change the described behavior.