Tags

Recent tweets

How can I test Storwize V7000 Node Canister failure?

I have received this question several times, so it's clearly something people are interested in.

The Storwize V7000 has two controllers known as node canisters. It's an active/active storage controller, in that both node canisters are processing I/O at any time and any volume can be happily accessed via either node canister.

The question then gets asked: what happens if a node canister fails and can I test this? The answer to the question of failure is that the second node canister will handle all the I/O on its own. Your host multipathing driver will switch to the remaining paths and life will go on. We know this works because doing a firmware upgrade takes one node canister offline at a time, so if you have already done a firmware update, then you have already tested node canister fail over. But what if you want to test this discretely? There are four ways:

Walk up to the machine and physically pull out a node canister. This is a bit extreme and is NOT recommended.

Power off a node canister using the CLI (using the satask stopnode command). This will work for the purposes of testing node failure, but the only way to power on the node canister is to pull it out and reinsert it. This is again a bit extreme and is not recommended. This is also different to an SVC, since each SVC has it's own power on/off button.

Use the CLI to remove one node from the I/O group (using the svctask rmnode command). This works on an SVC because the nodes are physically separate. On a Storwize V7000 the nodes live in the same enclosure and a candidate node will immediately be added back to the cluster, so as a test this is not that helpful.

Place one node into service state and leave it there will you check all your hosts. This is my recommended method.

First up this test assumes there is NOTHING else wrong with your Storwize V7000. We are not testing multiple failure here. You need to confirm the Recommend Actions panel as shown below, contains no items. If there are errors listed, fix them first.

Once we are certain our Storwize V7000 is clean and ready for test, we need to connect via the Service Assistant Web GUI. If you have not set up access to the service assistant, please read this blog post first.

So what's the process?

Firstly logon to the service assistant on node 1 and place node 2 into service state. I chose node 2 because normally node 1 is the configuration node (the node that owns the cluster IP address). You need to confirm your connected to node 1 (check at top right) and select node 2 (from the Change Node menu) and then choose to Enter Service State from the drop down and hit GO.

You will get this message confirming your placing node 2 into service state. If it looks correct, select OK.

The GUI will pause on this screen for a short period. Wait for the OK button to un-grey.

You will eventually get to this with Node 1 Active and Node 2 in Service.

Node 2 is now offline. Go and confirm that everything is working as desired on your hosts (half your paths will be offline but your hosts should still be able to access the Storwize V7000 via the other node canister).

When your host checking is complete, you can use the same drop down to Exit Service State on node2 and select GO.

You will get a pop up window to confirm your selection. If the window looks correct, select OK.

You will get the following panel. You will need to wait for the OK button to become available (to un-grey).