Five disaster-proofers take different tacks to mail server protection

An e-mail server can stop delivering e-mail for several reasons: a loss of Internet connectivity, a hardware failure, an operating system crash, an e-mail server software crash, or a corruption of the database that stores the messages. The traditional backup-and-restore process can take hours to resurrect a server, and any mail that comes in while the server is down will be lost. As a result, not surprisingly, many organizations demand CDP (continuous data protection) for e-mail.

The options start with Microsoft’s own Windows Server 2003 Clustering Services and extend to a range of third-party fail-over and high-availability solutions. Windows clustering allows Exchange Server 2003 to be set up in either an active/passive cluster or a cluster of multiple servers with one standby server. This is highly effective in ensuring uptime, but it is complex to set up, requires extra hardware and licenses, and does not protect against data loss or database corruption (see “Windows clustering a costly option for Exchange fail-over”).

The solutions reviewed here can cope with almost any Exchange-related mishap, except Internet failures, and they do so more simply, at lower costs, and with additional flexibility or protection compared with the native Exchange cluster. Two solutions, Neverfail for Exchange and SteelEye LifeKeeper, bring true fail-over to an entire Exchange server. Two others, Cemaphore Systems MailShadow and Quest Availability Manager, protect individual mailboxes on one or more Exchange servers. And one, Lucid8 DigiVault, provides backup of data stores that can be restored to a secondary Exchange server. For maximum protection, administrators might choose to implement a fail-over system plus the CDP that DigiVault provides. (Yet another alternative is a high-availability Exchange appliance from Azaleos or Teneros. These solutions are installed on your premises but managed and monitored off-site. See our review "High-availability Exchange made easy").

Each product takes a different approach to protecting Exchange and offers different advantages. Some of the differentiators are, for example, whether an Exchange server license is required for the backup server, whether more than one server can be protected by a single backup server, whether an agent is required on each Exchange server, and whether replication over WAN links is supported.

The test setup for each product consisted of a domain controller (Active Directory), two Exchange servers (the primary and secondary), and any additional servers as required by the individual product. I set up replication of the primary Exchange server to the secondary and then simulated failures by unplugging the network cable from the primary, stopping the Exchange Information Store service, and dismounting the drive the information store was running on, while monitoring incoming messages and simulating traffic using LoadSim. I observed the Outlook client experience when the primary server failed, as well as the time required to fail over to the secondary server.

Click for larger view.

Neverfailfor Exchange

Neverfail is a true, automatic, active/passive fail-over solution. It uses primary and secondary Exchange servers linked via crossover cable to maintain a heartbeat connection and perform data synchronization. If the primary server experiences a hardware or software failure, the secondary server assumes its IP address and hostname and resumes operation. I tested Neverfail for Exchange 5.0. Neverfail Group offers a variety of application modules other than Exchange, including IBM Lotus Domino, Microsoft File Server, Oracle Database, SharePoint, and SQL Server.

Neverfail provides functionality comparable with that of Windows Clustering, and because it doesn’t require Windows Server 2003 Enterprise or DataCenter and Exchange Enterprise Edition, the overall cost is comparable. Neverfail goes beyond Windows Clustering in providing easier setup, great management, and an intelligent analysis and monitoring tool that can find and resolve problems on the Exchange server before they cause failures. Further, as opposed to Windows Clustering, Neverfail doesn’t require the hardware of the primary and secondary systems to be identical.

With the Neverfail system, LAN users don’t need to restart Outlook. The interval between failure of the primary server and starting the secondary server is short, about two minutes in my testing. Users connecting via MAPI or the Outlook Web Access client may need to restart the client to connect to the backup server.

The Neverfail system requires an additional NIC in the primary server, and a backup server running the same server OS and the same version of Exchange. Neverfail runs on Windows 2000 Server or Windows Server 2003, and it supports Exchange 2000 and Exchange 2003.

Setup is simple and straightforward. The Neverfail SCOPE (Server Check Optimization Performance Evaluation) utility identifies any performance or configuration issues with the Exchange server and recommends solutions before installing the fail-over software. It takes snapshots of server performance and performs trend analysis to identify areas that may become problems in the future. It also generates a system ID that Neverfail uses to create a license key. After the key has been received from Neverfail, the system clones the Exchange server to the backup system.

The installation copies all application files, registry settings, services, and data stores associated with Exchange, so the backup server is a perfect duplicate, including any software updates, service packs, and so on. The system monitors all the key services, as well as the main Exchange server process, so any problems -- even with associated software or performance degradation -- can trigger the fail-over.

Pricing begins at $7,600, which includes Heartbeat (the core engine), the Exchange module, and four SCOPE analysis cycles (the initial analysis of the server prior to installation, and three follow-up checks), as well as maintenance for one year. Pricing is per pair of servers, based on the server in the pair with the greater number of CPUs. A low-bandwidth module is available that enables compression and encryption over a WAN link, as well as asynchronous replication. This would normally be used for additional data backups rather than fail-over.

Neverfail is relatively expensive, especially if you have multiple Exchange servers. It is Click for larger view. probably less expensive than using Microsoft Exchange clustering, and it’s much easier to set up. If you need 24/7 uptime for all e-mail users, Neverfail is a good way to go, although SteelEye’s LifeKeeper offers more functionality at a lower price.

Neverfail is relatively expensive, especially if you have multiple Exchange servers. It is Click for larger view. probably less expensive than using Microsoft Exchange clustering, and it’s much easier to set up. If you need 24/7 uptime for all e-mail users, Neverfail is a good way to go, although SteelEye’s LifeKeeper offers more functionality at a lower price.

SteelEyeLifeKeeper

SteelEye LifeKeeper is a server fail-over product similar to Neverfail, but it offers additional flexibility, including scheduling of replication for off-peak hours (or with a 24-hour delay to ensure that store corruption isn’t passed on), compression for replication over WAN links, and one-to-many replication to create multiple copies of a single server.

LifeKeeper can run on any version of Windows 2000 or Windows Server 2003. It supports Exchange 2000 and Exchange 2003, and it doesn’t require identical hardware for primary and secondary servers. The cost is less than Windows Clustering, at $3,280 per pair of servers, and one standby Exchange server can protect multiple active Exchange servers, although capacity planning will be essential in case all the active Exchange servers fail at once. In addition, LifeKeeper supports shared storage between the primary and secondary servers, which can speed up the fail-over process.

For this review, I tested LifeKeeper 5.3. Setting up LifeKeeper is straightforward. You will need to create service accounts, as with the other solutions, but the documentation steps you through the process. Clients get an error message during fail-over, but clients on the LAN will only need to retry the operation -- restarting Outlook is not necessary. As with Neverfail, users connecting via MAPI or Outlook Web Access may need to restart the client to connect to the backup server.

LifeKeeper provides data compression and encryption over a WAN connection, and it can replicate to a local server for fail-over, as well as to a remote server for business continuity. The LifeKeeper GUI can administer all LifeKeeper clusters in an enterprise via a straightforward interface.

LifeKeeper offers features that Neverfail doesn’t, and at a lower price. LifeKeeper’s setup is a little more complex than Neverfail’s, but this is partly because of the additional features. One interesting extra is the ability to fail over from a physical to a virtual server, or vice versa, although most admins will not be comfortable running mail servers in a virtual environment just yet. Unless you already have an investment in other Neverfail clustering technologies, LifeKeeper is a better deal.

CemaphoreSystems MailShadow

MailShadow is not strictly speaking an Exchange fail-over product; more accurately, it’s an Exchange mailbox fail-over product. MailShadow uses the Exchange transaction log to mirror each transaction for designated e-mail accounts on one or more Exchange 2003 servers to a backup Exchange server. If a primary Exchange server fails, or its database is corrupted, the designated accounts can access the backup server instead. Because the replication is based on transactions, no corruption of the Exchange database is passed on to the backup. I tested Version 2.0.

In addition to the primary Exchange 2003 servers that host the mailboxes to be protected, MailShadow requires three physical systems: the Source MailShadow Gateway, the Recovery MailShadow Gateway, and the Recovery Exchange Server. In a corporate environment, the Source MailShadow Gateway would be hosted in the main Exchange datacenter, while the Recovery MailShadow Gateway and Recovery Exchange Server would be in a remote DR (disaster recovery) site. Only one gateway is needed at each end, and one Recovery server can support multiple Source servers. All of the servers should be in the same Windows domain.

In addition to setting up the three additional servers, you will need to set up a service account, give it the proper permissions and delegation rights for each Exchange server to be protected, and then add the account to a group created during the MailShadow install. These procedures are well-documented in the manuals.

Click for larger view.

When e-mail accounts have been designated as protected, there is an initial interval required for creating the backup accounts with the messages already existing in the protected accounts. The time necessary for this process will depend on the amount of e-mail stored in the inbox. In my tests, replicating an inbox of about 200KB took just a couple of minutes. But with an inbox of 1.1GB, initial replication took several hours. If you have a lot of users with fat inboxes, you might want to start replication over a weekend.

Administration via the MMC (Microsoft Management Console) snap-in is easy and follows the usual MMC conventions. Administrators can control replication by storage group, by Exchange server, or by individual accounts.

After the initial synchronization, any further transactions -- receipt of new mail, deletions of messages, moves from one folder to another, and edits of messages -- are captured and replicated to the backup server, in chronological order. This is done asynchronously, but in the same sequence as on the primary server. No agent is required on the primary Exchange server because MailShadow uses the Exchange transaction engine APIs via MAPI to identify transactions to replicate.

MailShadow identifies duplicate attachments, sending each attachment only once across a WAN link to reduce traffic loads.

As opposed to Quest, Cemaphore has chosen to use a manual fail-over process to avoid spurious fail-overs that could result in a conflict between the primary and backup mailboxes. With MailShadow, when a mailbox becomes unavailable the administrator must switch users over to the backup mailbox. This can be done on an individual basis or for all users on a given Exchange server. Users must restart Outlook to reconnect to the backup mailboxes.

When the primary Exchange server is brought back online, users can be switched back to the primary account. Any changes to the mailboxes that occur during the fail-over are incrementally updated on the primary server. If the primary Exchange server is completely wiped out, a full replication operation will take place. The switch-over process after the restore is manual as well.