Configuring EMC Isilon SmartConnect – Part II: SmartConnect Advanced

In Part I, I covered SmartConnect Basic, an included feature of OneFS, that handles connections to Isilon node IP addresses based on a Round Robin connection policy.

What do you do if you want a more dynamic, more resilient, and better load balanced solution? You implement SmartConnect Advanced. Remember, that SmartConnect Advanced requires an additional license.

So why is SmartConnect Advanced so much better than SmartConnect Basic? From a previous post, I mentioned:

SmartConnect Basic has the following features:

Static IP allocation to nodes

Connection Policy Algorithms

Round Robin (which node is next)

No Rebalancing

No IP Failover

SmartConnect Advanced has the following features:

Dynamic IP allocation to nodes

Connection Policy Algorithms

Round Robin (which node is next)

Connection Count (which is next given connections each node has)

Network Throughput (which is next given the amount of throughput each node has)

CPU Usage (which is next given the amount of CPU usage each node has)

Automatic or Manual Rebalancing

IP Failover Policy

Round Robin (which node is next)

Connection Count (which is next given connections each node has)

Network Throughput (which is next given the amount of throughput each node has)

CPU Usage (which is next given the amount of CPU usage each node has)

As can be seen by the feature set, SmartConnect Advanced allows for Dynamic IP allocation, more Connection Policies, Rebalancing, and Failover Policies. Quite a few more features.

Dynamic IP allocation to nodes
Dyanmic IP allocation to nodes gives some additional capabilities to the mix that may or may not be readily apparent. In Part I, the front end network (subnet0) had a range of 172.16.1.11-13. With SmartConnect Basic each node had a single address. If the range had been 172.16.1.11-172.16.1.16, each node would still only have 1 IP address each.

With SmartConnect Advanced, all of the addresses are added to the pool. In my 3 node cluster, all 6 addresses are evenly distributed when using SmartConnect Advanced. I only have to change my IP allocation method from Static to Dynamic.

With IP addresses being statically allocated (SmartConnect Basic), I can only use 1 IP for each node in the cluster. With SmartConnect Advanced, I can use multiple IP addresses per node. As I add more nodes to the cluster, the IP addresses are automatically distributed across the nodes. In short, I configure my range up front, and as I grow the cluster, I don’t have to make configuration changes to my clients.

Connection Policy Algorithms
SmartConnect Basic only has a Round Robin connection policy. There is no intelligence on the decisioning of which IP to give out by SmartConnect, other than to say which IP is next in, what I would call a “hunt group.” Yeah, an old PBX analogy, but pretty accurate here. With Round Robin, we go to the first IP, then the second, then the third, and so on. Not much intelligence there…

These additional options add a lot of flexibility to determining which node is going to answer next for the cluster.

Round Robin: The cluster issues the IP address for the next node in the “hunt group”

Connection Count: The cluster issues the IP address for the node that has the least number of connections.

Network Throughput: The cluster issues the IP address for the node that has the least amount of network throughput

CPU Usage: The cluster issues the IP address for the node that has the lowest CPU utilization

Looking at these connection policy algorithms, we can see that there are, based on our use case/workload/etc, options that can add some intelligence to the balancing of connections to the cluster.

After changing my Connection policy to Connection Count, and running my PowerShell script mentioned in Part I, I can see that I am no longer using Round Robin for the resolution of cluster.isilon.jasemccarty.com.

In the graphic above, not only do I have additional IP addresses, I also have a different connection policy being demonstrated. Despite only having 3 nodes in my cluster, my cluster answers for 6 addresses. This is because SmartConnect Advanced is balancing all of the IP addresses all of the nodes. Also notice that the IP addresses are not simply moving from IP to IP as a Round Robin connection policy would use.

But I only have 3 nodes? SmartConnect Advanced will spread the load out for me, automatically. And later, when I decide to add 3 more nodes… Wait for it… I don’t have to change my vSphere configuration, I only have perform a rebalance operation (I’ll go into that next) and each datastore will be on its own node.

Rebalancing
Another cool feature with SmartConnect Advanced is Rebalancing. This can be done either manually or automatically. By default the policy is Automatic Failback.

With the Rebalance policy set to Automatic Failback, the only times that a rebalance operation is triggered is when there is a change to

the cluster membership

the cluster’s external network configuration

a member network interface

When set to Manual Failback, the policy does not redistribute IP addresses until a rebalance command is issued via the command line or Administrative Web Interface. I have even seen situations where a cron job has been used to rebalance IPs on the cluster during scheduled times of less activity, including maintenance windows, etc.

IP Failover Policy
The other feature that SmartConnect Advanced adds, is IP Failover Policy. This has also been referred to as NFS failover. It determines how to redistribute the IP addresses to the nodes in the SmartConnect pool in the event that one of the other nodes becomes unavailable. For this to work, the IP allocation method has to be set to Dynamic. All four connection policies are available to the failover policy.

An example of the IP failover policy would be:

The nfs3 datastore is mapped to 172.16.1.13

The IP 172.16.1.13 is assigned to Node 3

Node 3 is no longer available

The IP 172.16.1.13 is reassigned to Node 1 based on the IP failover policy

Think of a situation where multiple vSphere hosts are accessing nfs3 on IP 172.16.1.13 and that node needs to be brought offline for maintenance, a rolling upgrade, or other administrative action. It would require significant planning and possibly downtime, if that IP address could not be moved to anther node. SmartConnect Advanced can accommodate for times when nodes are no longer available.

Additional SmartConnect Advanced additions
In addition to the above features, SmartConnect Advanced adds the ability to have multiple SmartConnect zones. Yes, multiple SmartConnect Zones. SmartConnect Basic only allows for a single zone, but SmartConnect Advanced provides the ability to have multiple zones. Imagine having a cluster with a combination of X, S, or NL series Isilon nodes… We would create a General Use SmartConnect zone with the NL series nodes assigned, as well as a performance zone with the X or S series nodes.

The ability to have multiple zones gives the flexibility of pointing clients that require the most performance to the fastest nodes in a cluster, while pointing clients that require general use to general purpose nodes in a cluster. When using SmartConnect Basic all clients access the same SmartConnect zone.

I didn’t mention it in Part I, but it is entirely possible, with either version of SmartConnect, to have more nodes in a zone than IP addresses available. Node membership in a SmartConnect zone is configurable, but does not require all nodes to be in the zone. The ability to choose which nodes are in a particular zone coupled with the ability to present multiple zones (SmartConnect Advanced), provides a huge amount of flexibility.

So in real-world implementations, where you have smartconnect basic, you’d only have one zone and all NFS and SMB clients would use the same reference. It would still make sense to set aside more than one IP per node just in case future expansion happens and your IPs are contiguous? And in smartconnect adv, you can have a zone for NFS and a zone for SMB with their own set of IPs? Possibly different nodes addressing different protocols?

For the networking, would you do aggregates or just put all interfaces by themselves? If there is one IP per node, which of the 2 or 4 network interfaces is actually doing the work?

With SmartConnect Basic, yes, there is only one SmartConnect Zone available. All IP’s don’t have to be configured upfront, but can be added later, as nodes are added.

With SmartConnect Advanced, you can have multiple zones, but to my knowledge, NFS and SMB will be presented on both sets of IP’s.

From a networking perspective, it would really depend on things like protocols being used, level of redundancy, etc. With NFS and the failover process, it isn’t necessarily required to use aggregation at the interface level, while using SMB/CIFS, because the protocol is stateful, aggregation at the interface level would provide redundancy at the data link layer, rather than the application layer.

Just a quick clarifying note for you all, even though this is an old thread. Whether you point clients at Static or Dynamic SmartConnect zones (if you have both available), is dependant upon the client protocol. Connection-less prototcols, that is those like NFSv2/v3 can use dynamic smartconnect zones, and this works very well, the client is frequently completely unaware that the node they were connected to ever went offline at all. Static smartconnect zones must be used for protocols that are connection-oriented, such as SMB, NFSv4, & FTP, where session state really matters. Again, this only makes a difference when we talk about failover behavior. But if you test it what you’ll see with a static smartconnect zone, assuming you hit f5 in an explorer window it’ll hand for about ~10 seconds, and then just keep working. You should not see disconnected network drives or anything like that. The trick is that Explorer realizes that the IP is down, does a new nslookup then connects to a different node. This all happens within a standard CIFS/SMB timeout window, so again most users do not notice the change.

Could someone please clarify for me. As far as DNS goes, do I need a NS record for the name of my cluster? For smartconnect IP, I will need A records? And DNS delegation is for the smartconnect zones names as well? Are the smartconnect zone names connected to the smartconnect IP? New to Isilon…

The only DNS name required (FQDN or Short name – provided it resolves) is for the SmartConnect Zone you are trying to provide.
1) The SmartConnect Service IP is used for the DNS Delegation. The delegation can point to either an IP or a DNS name
2) The SmartConnect Zone Name is handed out by the Isilon cluster, NOT by the DNS server.

In short the process goes like this:
1) A Client says “Where is the SmartConnect Zone?”

2) The DNS server says “Hey don’t ask me, talk to this other guy.”
(This is automatically redirected to the SmartConnect Service IP [IP or Name] because of the DNS Delegation.)

When SmartConnect Advanced is being used, and IP allocation is set to automatic, if that Node (Step 3) goes down, beforehand, it moves its IP to another Node. At that point, the other Node says “I’ll be handling requests for the other guy” and the Client doesn’t care.

Client says to DNS, “What’s the SmartConnect Zone IP?”
The DNS server says “I don’t know, let me ask the SSIP.”
The SSIP answers the request from the DNS server with an IP, based on the Zone’s policy.
The DNS server tells the original requester what IP to use.

If I recall correctly, you should be able to make the change without affecting the current IP allocation. It wouldn’t be until the next operation that invokes a change/redistribution. I believe that static is only going to give you a single IP per node. If you don’t have enough nodes for all the IPs, you might have some IPs not being bound.

vSphere 5.5 End of General Support

-248Days-13Hours-16Minutes-10Seconds

Disclaimer

Any views or opinions expressed here are strictly my own. While I am a blogger who works for VMware, I am solely responsible for all content published here. This is a personal blog, not a VMware blog. Content published here is not read, reviewed, or approved in advance by VMware and does not necessarily represent or reflect the views or opinions of VMware or any of its divisions, subsidiaries, or business partners.

Any of my code, configuration references, or suggestions, should be researched and verified in a lab environment before attempting in a production environment.

Agreement to use any of my code or recommendations, removes me from any liability as such.