Sign In Request

Troubleshooting Network Attached Storage (NAS) Issues (123049)

Title

Troubleshooting Network Attached Storage (NAS) Issues

Description

Customers using a NAS device as the location to store a Replay 4 or AppAssure 5 repository have a higher chance of running into potential network and performance issues relating to the NAS. This article contains descriptions of some NAS symptoms, and describes troubleshooting suggestions.

Resolution

Standard backups of protected machines are taken based on the protection schedule established. For example, every hour, snapshots are taken of each agent machine and the Exchange mail server. The information flows through the switch or router to the Core, and the Core transmits the data to the repository on the NAS, where information is deduplicated across all agent machines before it is saved. This standard flow is depicted in the diagram above.

If the repository is located on Direct-Attached Storage (DAS), the two potential points of failure include the hardware and the software. If the repository is on network-attached storage, troubleshooting complexity increases. When issues arise, the network itself is another potential culprit, and hardware troubleshooting complexity is increased with the addition of another family of hardware devices to investigate (the NAS device itself).

These all have read or write operations that have a more significant impact on network performance than a repository using DAS or a Storage Area Network (SAN). As such, putting the repository on a NAS places a substantial load on the NAS, with the network acting as a bottleneck for the many required read and write operations. Total throughput is lower than can be achieved using DAS or SAN. This should be planned for accordingly, and provides another avenue of investigation when issues are evident.

Dedicated NAS

Considering the heavy load of read/write operations, the NAS device used as a repository in a Replay 4 or AppAssure 5 environment should be dedicated to this purpose alone. If experiencing NAS performance or connection issues, consider re-tasking any other operations, such as file sharing, to other devices.

Dell recommends system memory of at least 8 gigabytes.

Intermittent Failures

If you experience NAS issues are intermittent, check that any mount failures may be occurring at a time when there is particularly high I/O activity. For example, there could be multiple processes occurring such as agent backups during VM Export, or nightly job occurring while also performing rollups, etc. Intermittent failures that are caused by too much I/O traffic are difficult to diagnose and may appear as unrelated issues such as replication failures or Exchange log truncations, but that are actually symptoms of an overtaxed NAS with too many I/O operations being attempted simultaneously. Changing the order of some I/O operations or otherwise reducing I/O is appropriate in these cases. Refer to troubleshooting steps. After trying other steps, consider reducing the rate of transfer speed to allow the NAS to catch up, as described below.

NAS Quality

The quality of the NAS is a major factor for ongoing success in the enterprise. When choosing a NAS, consider that up-front price may not be as important as the capability of the device. Lower-end NAS devices may have a much higher total cost of ownership when considering downtime, future upgrades, or lackluster performance. If using a NAS, Dell recommends enterprise-grade network attached storage for best performance. The higher-end NAS hardware that you use, the less likely you are to encounter NAS hardware issues (providing reasonable network load and environmental factors). Consider a device with features such as redundant Gigabit Ethernet connections or 10Gbit Ethernet connections; consider whether you need access to the Fibre Channel storage-area network (SAN). Consider if the NAS appliance allows you to upgrade capacity. Perform research before purchasing a NAS if possible; considering searching the internet using a phrase such as Guide to Network Storage.

For a NAS device to be supportable, the data saved to the repository must remain in the exact state in which the Core stored it. For this reason, for AppAssure 5, Dell does not support NAS devices that have their own built-in deduplication features if those features are enabled.

Sufficient input/output (I/O) transfer speed will yield the best results for backing up to the repository. Dell recommends hard drives of at least 7200 RPM with good access speeds. For transfer speeds, Dell recommends transfer speeds of at least 30 megabytes per second, with a minimum of at least 10 MB per second. If transfer speeds appear to be below 10 MB/second, the issues are most likely to be (a) a result of insufficient hardware, (b) hardware that would be sufficient but is being too heavily tasked (multi-purposed, or poor with multiple operations), or (c) a network that is saturated and is acting as a bottleneck for the transfer.

AppAssure 5 administrators should be aware that NAS devices are susceptible to the same environmental stresses as other systems on the network. Factors that affect network performance include number of concurrent users, network load, number of operations, frequency of backups, and other issues familiar to network administrators. You may consider optimizing your retention policy.

About Storage in the Cloud

A repository is used to store the snapshots that are captured from your protected workstations and servers. The repository can reside on different storage technologies such as Direct Attached Storage (DAS), Storage Area Network (SAN), or Network Attached Storage (NAS). However, the primary repository should never be stored on NAS devices that tier to the cloud. These devices tend to have performance limitations when used as primary storage.

Cloud storage can be used for a replicated core in the cloud. The source core is typically located within an enterprise, stored on a DAS, SAN, or NAS. The replicated core (target) can be stored on the cloud (for example, hosting a replicated core using a service such as eFolder or Amazon Cloud).

Recommended Configurations

The following recommendations suggest optimal configurations depending on the size of the environment:

Small. This configuration is intended to back up one to 10 agents with a change rate of 1 GB to 2 GB per agent per interval. The snapshot interval is assumed to be 15 minutes.

Storage: 1.2 times the data capacity of the protected volumes on all agents. Single DAS, SAN, or NAS storage for backup repository storage. Sustained DAS, SAN, or NAS performance should meet or exceed 150 MB per second. Due to storage bandwidth limitations, NAS storage should only be used for very small configurations.

NIC: 2 Gb by 1 Gb NIC in teamed mode.

Medium. This configuration is intended to back up to 10 to 20 agents with a change rate of 2 GB to 4 GB per agent per snapshot interval. The snapshot interval is assumed to be 15 minutes.

CPUs: Six physical cores, 12 logical (virtual) cores. Intel Xeon E5650 or better. Alternate configuration is two times an Intel E5620 or better

Memory: 16 GB.

Storage: 1.2 times the data capacity of protected volumes on all agents. Single DAS array or SAN repository for backup repository storage. Dell EqualLogic SAN storage is recommended. Sustained DAS or SAN array performance should meet or exceed 250 MB per second.

NIC: 2 Gb by 1 Gb NIC in teamed mode.

Troubleshooting NAS Issues

When experiencing NAS issues, follow the procedures below:

To ensure it is a NAS issue, verify whether the NAS is accessed over a network path. (Some NAS devices can be used as direct-attached storage.) This helps qualify the NAS as being a potential cause.

Typical symptoms when transferring data over the network to the NAS may include slow transfer to repository (for performance) or dropped or dismounted connections between the Core and the repository.

The next step is to determine what type of NAS device is running as the repository, to ensure the device is robust enough for its assigned task.

Ensure the NAS device does not use its own deduplication or otherwise change data stored to the NAS through the application.

Next, consider event logs. Using the operating system’s Event Viewer for the affected machines, access the system and hardware logs, and review any warnings and errors.

Check Window Logs > System

Check Applications and Services Logs > Hardware

When reviewing the logs, verify whether the operating system has had an issue maintaining a connection to the NAS. Specifically, look for NAS connection alerts or network drop-related alerts.

Typically, when the repository is dismounted, it is due to the connection to the NAS being compromised. When this happens, reboot the NAS device. The Core should recognize the NAS and will perform a repository check. When the repository check is complete, the NAS will be available to the Core.

Access the NAS management console or GUI, if one was included with your NAS, to determine if it is reporting connection or disk errors. If no UI, log into the NAS to see if it will connect.

If there are no logs, events or a console that you can log into to find this information, then Dell suggests rebooting the NAS. This is likely to clear any cache and buffers, and allows the network connection to resume and core connectivity to continue.

Then check the core log (AppRecovery.log) for errors that may point to connection issues.

For best performance, NIC drivers on the agent must be the latest from the chipset manufacturer. Check this and update if necessary.

Output buffers, if supported, may need to be increased to higher values (e.g. 2048). Settings may differ from environment to environment.

If your NAS device is experiencing performance issues, a potential solution is to reduce the rate of transfer queue depth on the Core Console. The Maximum Transfer Queue Depth transfer setting specifies the time to process a program without freezing or locking up during the transfer. This may allow the NAS to better handle the amount of network I/O. This setting may need some trial and error to be successful. One rule of thumb is to reduce the existing setting by about 25 percent, for example from 64 to 48. Your settings may differ based on the network traffic and the capability of your NAS device. To change transfer settings, follow the directions in Modifying Transfer Settings below.

Modifying Transfer Settings

In AppAssure 5, you can modify the settings to manage the data transfer processes for a protected machine.

Navigate to the AppAssure 5 Core Console and then click the Machines tab.

From the Machines tab, click the hyperlink for the machine you want to modify.

Enter the Transfer Settings options you want to change and then click OK to confirm your settings.

General NAS Troubleshooting Tips

Many NAS issues are difficult to diagnose, often seeming to indicate other errors. Generally, NAS issues are presented by connection issues between the Core and the NAS device. Investigation of such issues then can reveal environmental factors (slow network, flooded network utilization, and so on) that help point out the problem. Because errors differ dramatically with each NAS, try to test and isolate results.

Ensure that the NAS is dedicated to holding the repository for exclusively. Since saving Replay 4 or AppAssure 5 snapshots to the Core is I/O-intensive and since a NAS requires this traffic to flow through the network, key steps to resolving dropped connections when using a NAS are ascertaining that the NAS operations are strictly dedicated to supporting the repository only.

Reboot the NAS, and if connection problems recur, consider evaluating whether the NAS device is robust enough for your environment.

To test if the issue is the NAS device, you can create a second repository on a network share on a different server. If you see the primary repository go offline again, but the new repository does not, this is an indicator that the issue is the NAS device.

You should also investigate the network as a potential cause of dropped connections or slow performance.

You may consider using a SAN instead of a NAS as your storage approach for the repository. Note also that many NAS devices can be used as direct-attached storage if network problems persist.

Frequently Asked Questions

Q: Why does my NAS device sometimes drop its connection?

A1: Flooding the NAS with read/write requests can cause it to drop from the network. Ensure the only function of the NAS is to serve as the primary repository and ensure that it is not tiered to the cloud.

A2: Another cause is defective hardware. The NAS may be suffering from a degrading disk or a faulty network card which causes these drops. This can be difficult to detect when everything else looks to be in operation, and small writes or transfers that users may be performing are not affected. If large transfers fail and small ones do not, look to the hardware as a potential cause.

A3: If the amount of I/O traffic is overwhelming your NAS device, you can try updating the transfer setting queue depth to slow the transfer rates, which may allow the NAS to catch up.

Q: How can I verify if my NAS device itself is the cause for slow performance or dropped mounts/repository connections?

A: Check the system and hardware logs to verify the cause of dropped connections.

One test you can run is to perform a single backup job, and then perform multiple simultaneous jobs. Some devices appear to be functioning perfectly fine with a single read/write task, but as soon as you add a second task, the I/O rates plummet.

You can also create a second repository on another network share on a different server. If you see the primary repository go offline again, but the new repository does not, then that would point to the NAS device, suggesting an upgrade or considering using SAN or DAS over NAS for your repository.

Q: If I am still experiencing problems, what is the most efficient way to get support for my NAS issue?

A: Contact your Dell Support representative by email or the web. Include the following information:

Describe the problem you are experiencing.

Gather information your engineer will need to begin resolving the issue. Begin by running the AAInfo utility as follows:

On the Core server, open a command prompt.

Navigate to C:\Program Files\AppRecovery\Core\coreservice\AAInfo

Enter AAInfo.UI.exe and press Enter to launch the utility.

Add additional information such as screen shots by using the menu option to Add Custom Files.

Send the information you gathered as well as the following information about your NAS to the support engineer:

Indicate the NAS make and model.

If the issue is a dropped connection, indicate if the dropped connection has since come back online.

If you made any of the changes described in this article, take note of them, and report these changes to a Dell Support engineer.