Sentrium

Tag
networking

Ubiquiti USG (Unified Service Gateway) is a router and firewall appliance that is closely related to the EdgeMax product line, even though it's marketed as a part of the UniFi product family and focused on a different market segment.

It is meant to be managed by a Unifi controller, which overwrites the settings so configuring it in the traditional way will not make the changes permanent. Since it's hardware is closely related to the EdgeMax routers and its software is derived from EdgeOS, using only the limited configuration options available from the Unifi controller GUI would leave its capabilities seriously underutilized.

Fortunately, the controller allows customizing the config with JSON files, and the USG OS includes a tool for exporting the config from JSON. When a customer wanted to configure QoS for VoIP traffic in a way that is not supported by the controller, we had to look deeper into it, and found that the process is not nearly as easy as we hoped it will be.

The workflow is the following:

Add the required configuration directly on the USG (with set commands as in the usual EdgeOS)

Export it to JSON with mca-ctrl -t dump-cfg

Extract the relevant sections from that JSON and put them on the controller

The last part is the offender here. Unlike the "show" command, or the cli-shell-api tool, the tool for exporting the config to JSON (mca-ctrl) does not allow exporting only a part of the configuration. Moreover, the controller is capable of merging configuration files, but not loading a partial configuration, so the JSON dict representing the configuration should include all levels (for example, if you add just one firewall rule, the config should have "firewall", "name" etc. levels in it to work nonetheless).

The official guideline suggests picking the relevant parts from the JSON config by hand. Obviously, this is a tedious and error-prone process, so I started looking for a way to automate it.

Luckily, the USG OS has Python installed. It's Python 2.7 while I would prefer to see Python 3 there, but it still has a JSON parser and formatter in its standard library, so it was good enough for the job.

So, wrote a script that takes a
list of configuration paths as arguments and automatically export them
from the JSON generated by mca-ctrl into a single JSON object that
should be ready for deployment on the controller.

The script can be trivially extended to support taking JSON from a file or stdin in addition to running mca-ctrl on its own, if anyone wants it, let me know and I'll add it. Let me know if you have any problems with the script as well.

Installation

You can get the script from my github repo. Just copy it to your USG (with scp or otherwise), chmod +x it, and it's ready to run. Hope it saves you quite a bit of time if you also need to customize your USG configuration.

Tags

Abstraction

For a small company it is quite common to have two of four
servers, two switches which often supports Multi-chassis EtherChannel and a
low-end storage system. It is quite important for such companies to fully
utilize their infrastructure and thus all available technologies and this
article will describe one aspect how to do this with ONTAP systems. Usually
there is no need to dig too deep in to LACP technology but to those who wants
to, welcome to this post.

It is important not just to tune and optimize one part of
your infrastructure but whole stack to achieve the best performance. For
instance, if you will optimize only network then storage system might become a
bottleneck in your environment and vice versa.

Majority of modern servers have on-board 1 Gbps or even 10
Gbps Ethernet ports.

In some of the old ONTAP storage systems like FAS255X and
more modern FAS26XX have 10Gbps on-board ports.
In this article I will focus on example with a FAS26XX system with 4x
10Gbps ports on each node and two servers with 2x 10Gbps ports and a Cisco
switch with 10Gbps ports and support for Multi-chassis EtherChannel. But this
article would apply to any small configuration.

Scope

So, we would like to be able to fully utilize network
bandwidth in storage system and servers and prevent any bottlenecks. One way to
do this is to use iSCSI or FCP protocols which have built-in load balancing and
redundancy thus in this article we will overview protocols which do not have
such an ability, like CIFS and NFS. Why would users be interested in those NAS
protocols which doesn’t have built-in load balancing and redundancy? Because
NAS protocols have file granularity and file visibility from ONTAP perspective
and in combination in many cases give more agility then SAN protocols wile
network “features”
of NAS protocols could be easily enough fixed with functionality of network
switches build-in nearly in any switch. Of course, technologies not
magically work, and, in each approach, there are some nuances and considerations.

In a lot of cases users would like to
use both SAN and NAS on top of single pare of Ethernet ports with ONTAP systems
and for this reason first thing you should consider is NAS protocols with load
balancing and redundancy and only then adapt SAN connection to it. NAS
protocols with SAN on top of Ethernet ports often case for customers with
smaller ONTAP systems where number of ethernet ports is limited.

Also, in this article I will avoid technologies like vVols
over SAN, pNFS, dNFS and SMB multichannel. I would like to write about VVol in
another dedicated article while it is not related to NAS or SAN protocols
directly but can be part of the solution which provide on one hand file
granularity and on another hand can use NFS or iSCSI, where iSCSI could
natively load-balance traffic across all available network paths. pNFS unfortunately currently supported only
with RedHat/CentOS systems for enterprise environments, not wide spread and
does not provide native load balancing because NFS Trunking currently in draft while SMB multichannel currently not supported with
ONTAP 9.3 itself.

In this situation we have few configurations left.

One is to use solely NAS protocols with Ethernet
port aggregation

Another one is to use NAS protocols with Ethernet port aggregation and SAN on top of aggregated ports, which could be
divided in two subgroups:

Or where you are using iSCSI as SAN protocol

Where you are using FCoE as SAN protocol

Native FC protocol require dedicated ports and could not work over ethernet ports

Even though FCoE on top of aggregated Ethernet ports with
NAS is possible network configuration with ONTAP system, I will not discuss it
in this article because FCoE is supported only with expensive converged network
switches like Nexus 5000 or 7000 thus not scope of interest of small companies.
Though FC and FCoE provide quite compatible performance, load balancing and
redundancy with ONTAP systems (with other vendors it could be different) so
there is no reason to pay more.

NAS protocols with ethernet port aggregation

Both variants: NAS protocols with ethernet port aggregation
and NAS protocols with ethernet port aggregation with iSCSI on top of
aggregated ports will have quite similar network configuration and topology.
And this is the configuration I will discuss in this article.

Theoretical part

Unfortunately, ethernet load balancing works not
sophisticated as in SAN protocols in a quite simple way. I personally even would call it load
distribution instead of load balancing because ethernet not paying attention to
“balancing” part and not actually trying to evenly distribute load across links
instead it just distributing load hoping that there would be plenty of network
nodes generating read and write threads and simply because of Probability
theory load would be more or less evenly distributed. The less nodes in the network, the less network threads, the les probability that each network link will
be evenly loaded across network links and vice versa.

The simplest
algorithm for ethernet load balancing sequentially picks one of the network
link for each new thread, one by one. Another algorithm uses hash sum from
network address of sender and recipient to peek one network link in the
aggrege. Network address could be IP
address or MAC address or something else. And this small nuance
plays very important role in this article and your infrastructure. Because in
case if for to pare of source and destination addresses hash sum will be same,
then algorithm will use same link in the aggregate. In another word it is
important to understand how load balancing algorithm works to ensure that
combinations of network addresses would be such that you not only will get
redundant network connectivity but also to ensure you will utilize all network
links. Especially it become important for small companies with few participants
in their network.

It is quite often
that 4 servers could not fully utilize 10Gbps links but during peak utilization it
is important to evenly distribute network threads between links.

Typical network topology and
configuration for small companies

In my example we
have 2 servers, 2 switches and one storage system with two storage nodes
running ONTAP 8.3 or higher with next configuration, and also keep in mind:

From
a storage node two links goes one to first switch, another link to second
switch

Switches
configured with technologies like vPC (or similar) or switches are stacked

Switches
configured with Multi-chassis EtherChannel/PortChannel technology, so
two links from server connected to two switches aggregated in a single
EtherChannel/PortChannel. Links from a storage node connected to two switches
aggregated in a single EtherChannel/PortChannel.

Each
volume mounted on each server as a file share, so each server will be able to
use all 4 volumes

Minimum number of
volumes for even traffic distribution is pretty much determined by biggest
number of links from either a storage system or a server, in this example we
have 4 ports on each storage nodes, which means we need 4 volumes total. In
case if you have only 2 network links from each server and two from a storage
system node, I will still suggest keeping at least 4 volumes which is good not
only for network load balancing but also for storage node CPU load balancing.
In case of FlexGroup it is enough to have only one such a group but keep in
mind it is currently not optimized for high metadata workloads like virtual
machines and data bases.

One IP addresses for each storage node with two or four links on each node in configurations with two
or more hosts each with two or four links and with one IP addresses for each
host, almost always enough to provide even network distribution. But with one
IP address for each storage node and one IP address for each host, even
distribution could be achieved in perfect scenarios where each host will access
each IP address evenly what on practice hard to achievable, quite hard to predict, and it could
change with time. So, to increase probability
of more even network load distribution we need to divide traffic in more threads
and the only way to do this with LACP is to increase number of IP addresses.
Thus, for small configurations with two of four hosts and two storage nodes
each with 2x IP addresses instead of one will help to increase probability of more even network traffic distribution
across all network links.

Unfortunately, conventional NAS protocols do not allow hosts to recognize a file share
mounted with different IP addresses as a single entity. So, for example if we
will mount an NFS file share to VMware ESXi with two different IP addresses,
hypervisor will see them as two different Datastores and in case user will be
interested in network link re-balancing a VM need to be migrated on a Datastore
with different IP but in order to move that VM, storage vMotion will be
involved even though it is the same network file share (volume).

Network Design

Here is recommended and well-known network design often used
with NAS protocols.

Image #1

But simply cabling and configuring switches with LACP
doesn’t guarantee you that network traffic will be balanced across all the
links in the most efficient way, well, it depends, and even if it is this can
change after a while. To ensure we get maximum from both network and storage
system we need to tune them a bit, to do so we need to understand how LACP and
storage system works. For more network designs, including wrong designs, see slides
here.

Image #2

LACP protocol & algorithm

In ONTAP world nodes in a storage system for NAS protocols
works nearly as they separated from each other, so you can percept them as
separated servers this architecture called share-nothing. The only difference
is if one storage node die second will take it’s disks, workloads and copy IP
so hosts will be able to continue to work with their data as nothing happens,
this called takeover in a High Availability pare; also with ONTAP you can move
IP and Volumes online between storage nodes, but let’s not focus on this. Since
we remember that storage nodes as independent servers LACP protocol could
aggregate few ethernet ports only within a single node, so it not allows you to
aggregate ports from multiple storage nodes. While with Switches we can
configure Multi Chassis Ether Channel so LACP protocol will aggregate ports
from few switches.

Now LACP algorithm select only link for the next hop, one
step at a time so full path from sender to recipient not established nor
handled by initiator as it done in SAN. Communication between same two network nodes
could be sent through one path while response could come back through another
path. LACP algorithm uses hast sum of source and
destination addresses to select path. The only way
to ensure your traffic goes by expected paths with LACP protocol is to enable load balancing by IP or MAC addresses hash sum and
then calculate hash sum result or test it on your equipment. With right combination of source and
destination address you can ensure LACP algorithm will select your preferred
path.

LACP
algorithm could be realized in different ways on server, switch and storage
system, that’s why traffic from server to storage and from storage to server
cold be put in different path.

There are few addition important circumstances which will influence
on your storage partitioning and source & destination IP address selection.
There are applications which can share volumes like VMware vSphere where each
ESXi host can work with multiple volumes; and configurations where volumes not
shared by your applications.

One volume & one IP per node

Since we have two ONTAP nodes with
share-nothing, and we want to fully utilize storage systems, we need to create
volumes on each node and thus at least one IP on each node on top of aggregated
ethernet interface. Each aggregated interface consists of two ethernet ports.
In the next network designs some of the objects where not displayed (such as network
links and server) to focus on some of the aspects, note all the next network designs are based on the very first image “LACP network design”.

Design #3A

Let’s see the same example but from storage perspective. Let
me remind you that in the next network designs some of the objects where
not displayed (such as network links and server) to focus on some of the
aspects, note all the next network designs are based on the very first image “LACP
network design”.

Design #3B

Two volumes & one IP per node

But some of the configurations does not share volumes
between applications running on your servers. So, to utilize network all the
links we need to create on each storage node two volumes: one used only by
host1, second used only by host2. Volumes and connections to second node not
displayed to make image simple, in reality they are existing and are
symmetrical to first storage node.

Design #4A

Let’s see the same configuration but from storage
perspective. As in previous images symmetrical part of connections are not
displayed to simplify image: in this case symmetrical connections to blue
buckets on each storage node not displayed but in real configuration exists.

Design #4B

Two volumes & two IPs per node

Now if we will increase number of IP, we can mount each
volume over two different IP addresses. In such a scenario each mount will be
percepted by hosts as two separate volume even though it is physically the same
volume with same data set. In this situation often makes sense to also increase
number of volumes, so each volume will be mounted with it’s own IP. Thus, we
will achieve more even network load distribution across all the links, ether
for shared or non-shared application configuration.

Design #5A

In non-Shared volume configuration each volume used by only
one host. Designs 5A & 5B are quite similar and differ one from another only by how the
volumes are mounted on hosts.

Design #5B

Four volumes & two IPs per node

Now if we will add more volumes and IP addresses to our
configuration where we have two applications which not share volumes and could achieve
even better network load balancing across links with right combination of
network share mounts. The same design could be used with application which
share volumes and similar to design on image 5.

Which design is better?

Whether your applications using shared volumes or not, I
would recommend:

Design #3
for environments where you have multiple independent applications, so with multiple apps you will have in total at least 4 or more volumes on each storage node.

Or Design
#6 if you are running only one application like VMware vSphere and not
planning to add new applications and volumes. Use 4 volumes per node minimum whether you have shared or non-shared volumes.

How to ensure network traffic goes by expected path?

This is more complex and geek stuff. In real world you can
run in situation where your switch can decide to put your traffic through additional
hop or hash sum from your source and destination addresses pare of two or more
pare could overlap. To ensure your network traffic goes by expected path you
need to calculate hash sum. Usually in big enough environments where you have
many volumes, file shares and IP addresses you do not care about this because
more IP you have more probability that your traffic will distribute load over
your links simply because of the Probability theory. But if you care and you
have small environment, you can brute force passwords
IPs for your server and storage.

Configuring Switches

This is the place where 90% of human error done. People
often forget to add word “active” or add it to right place etc.

Example of Switch configuration

Cisco Catalyst 3850 in stack with 1Gb/s ports

Note “mode active” means “multimode_lacp”
in ONTAP, so each interface must have next configuration: “channel-group X mode active”, not Port-channel. Noteconfiguration “flowcontrol receive on” depends on port speed, so if storage sends flow control, then “other
side” must receive it. Note it is recommended to use RSTP, in our case with
VLANs it is Rapid‐PVST+ and configure switch ports connected to storage
and servers with spanning-tree portfast.

system mtu 9198

!

spanning-tree mode rapid-pvst

!

interface Port-channel1

description N1A-1G-e0a-e0b

switchport trunk native vlan 1

switchport trunk allowed vlan 53

switchport mode trunk

flowcontrol receive on

spanning-tree guard loop

!

interface Port-channel2

description N1B-1G-e0a-e0b

switchport trunk native vlan 1

switchport trunk allowed vlan 53

switchport mode trunk

flowcontrol receive on

spanning-tree guard loop

!

interface GigabitEthernet1/0/1

description NetApp-A-e0a

switchport trunk native vlan 1

switchport trunk allowed vlan 53

switchport mode trunk

flowcontrol receive on

cdp enable

channel-group 1 mode active

spanning-tree guard loop

spanning-tree portfast trunk feature

!

interface GigabitEthernet2/0/1

description NetApp-A-e0b

switchport trunk native vlan 1

switchport trunk allowed vlan 53

switchport mode trunk

flowcontrol receive on

cdp enable

channel-group 1 mode active

spanning-tree guard loop

spanning-tree portfast trunk feature

!

interface GigabitEthernet1/0/2

description NetApp-B-e0a

switchport trunk native vlan 1

switchport trunk allowed vlan 53

switchport mode trunk

flowcontrol receive on

cdp enable

channel-group 2 mode active

spanning-tree guard loop

spanning-tree portfast trunk feature

!

interface GigabitEthernet2/0/2

description NetApp-B-e0b

switchport trunk native vlan 1

switchport trunk allowed vlan 53

switchport mode trunk

flowcontrol receive on

cdp enable

channel-group 2 mode active

spanning-tree guard loop

spanning-tree portfast trunk feature

Cisco Catalyst 6509 in stack with 1Gb/s ports

Note “mode active” means “multimode_lacp”
in ONTAP, so each interface must have next configuration: “channel-group X mode active”, not Port-channel. Noteconfiguration “flowcontrol receive on” depends on port speed, so if storage sends flow control, then “other
side” must receive it. Note it is recommended to use RSTP, in our case with
VLANs it is Rapid‐PVST+ and configure switch ports connected to storage
and servers with spanning-tree portfast.

system mtu 9198

!

spanning-tree mode rapid-pvst

!

interface Port-channel11

description NetApp-A-e0a-e0b

switchport trunk native vlan 1

switchport trunk allowed vlan 53

switchport mode trunk

flowcontrol receive on

spanning-tree guard loop

spanning-tree portfast trunk feature

!

interface Port-channel12

description NetApp-B-e0a-e0b

switchport trunk native vlan 1

switchport trunk allowed vlan 53

switchport mode trunk

flowcontrol receive on

spanning-tree guard loop

spanning-tree portfast trunk feature

!

interface GigabitEthernet1/0/1

description NetApp-A-e0a

switchport trunk encapsulation dot1q

switchport trunk native vlan 1

switchport trunk allowed vlan 53

switchport mode trunk

flowcontrol receive on

cdp enable

channel-group 11 mode active

spanning-tree guard loop

spanning-tree portfast trunk feature

!

interface GigabitEthernet2/0/1

description NetApp-A-e0b

switchport trunk encapsulation dot1q

switchport trunk native vlan 1

switchport trunk allowed vlan 53

switchport mode trunk

flowcontrol receive on

cdp enable

channel-group 11 mode active

spanning-tree guard loop

spanning-tree portfast trunk feature

!

interface GigabitEthernet1/0/2

description NetApp-B-e0a

switchport trunk encapsulation dot1q

switchport trunk native vlan 1

switchport trunk allowed vlan 53

switchport mode trunk

flowcontrol receive on

cdp enable

channel-group 12 mode active

spanning-tree guard loop

spanning-tree portfast trunk feature

!

interface GigabitEthernet2/0/2

description NetApp-B-e0b

switchport trunk encapsulation dot1q

switchport trunk native vlan 1

switchport trunk allowed vlan 53

switchport mode trunk

flowcontrol receive on

cdp enable

channel-group 12 mode active

spanning-tree guard loop

spanning-tree portfast trunk feature

Cisco Small Business SG500 in stack with 10Gb/s ports

Note “mode active” means “multimode_lacp”
in ONTAP, so each interface must have next configuration: “channel-group X mode active”, not Port-channel. Noteconfiguration “flowcontrol off”
depends on port speed, so if storage not using flow control (flowcontrol none),
then on “other side” flowcontrol must also be disabled. Note it is recommended
to use RSTP and configure switch ports connected to storage and servers
with spanning-tree portfast.

interface Port-channel1

description N1A-10G-e1a-e1b

spanning-tree ddportfast

switchport trunk allowed vlan add 53

macro description host

!next command is internal.

macro auto smartport dynamic_type host

flowcontrol off

!

interface Port-channel2

description N1B-10G-e1a-e1b

spanning-tree ddportfast

switchport trunk allowed vlan add 53

macro description host

!next command is internal.

macro auto smartport dynamic_type host

flowcontrol off

!

port jumbo-frame

!

interface tengigabitethernet1/1/1

description NetApp-A-e1a

channel-group 1 mode active

flowcontrol off

!

interface tengigabitethernet2/1/1

description NetApp-A-e1b

channel-group 1 mode active

flowcontrol off

!

interface tengigabitethernet1/1/2

description NetApp-B-e1a

channel-group 2 mode active

flowcontrol off

!

interface tengigabitethernet2/1/2

description NetApp-B-e1b

channel-group 2 mode active

flowcontrol off

HP 6120XG switch in blade chassis HP c7000 and 10Gb/s ports

Note “trunk 17-18
Trk1 LACP” means
“multimode_lacp” in ONTAP. Noteconfiguration “flowcontrol
off” not present in here which
means it set to “auto” by default so if a network node connected to the switch
will have disabled Flowcontrol, then switch will not use it also. Flowcontrol
depends on port speed, so if storage not using flow control (flowcontrol none),
then on “other side” flowcontrol must also be disabled. Note it is recommended
to use RSTP and configure switch ports connected to storage and servers
with spanning-tree portfast.

We can clearly see one of the link is not utilized. Why it
happens? Because sometimes algorithm which calculates hash sum of pair source
and destination generate the same value for two pairs of source and destination
addresses.

SuperFastHash in ONTAP

Instead of ordinary algorithm widely used by hosts and
switches ((source_address XOR
destination_address) % number_of_links), ONTAP starting
with 7.3.2 using algorithm called SuperFastHash which gives more dynamic, more
balanced load distribution for a big number of clients, so each TCP session
associated with only one physical port.

Looking to the future

Though NAS protocols have their disadvantages because they
do not have built-in multi pathing and load-balancing they rely on LACP. But
they evolve and bit by bit copying abilities from other protocols.

For example, SMB v3 protocol with Contiguous Availability
feature can survive online IP movement between ports and nodes without
disruption which is available in ONTAP, thus can be used with MS SQL &
Hyper-V. Also, SMB v3 protocol supports multichannel which provides build-in
link aggregation and load balancing without relying on LACP, currently not
supported in ONTAP.

NFS from the beginning was not session protocol so with IP move
to another storage node application survives. Further NFS evolves and in
version 4.1 get feature called pNFS which provide ability to automatically and
in transparent way to switch between nodes and ports in case data been moved to
follow the data similarly to SAN ALUA, which is also available in
ONTAP. Version 4.1 of NFS also include session trunking feature,
similarly to SMB v3 multichannel feature it will allow to aggregate links
without relying on LACP, currently not supported in ONTAP.
NetApp drives NFS v4 protocol with IETF, SNIA and open-source community to
accept it as soon as possible.

Conclusion

Though NAS protocols have disadvantages, mainly because of
underlying Ethernet & more precise LACP it is possible to tune LACP to
mostly efficient utilize your network and storage. With big environments
usually, no need for tuning but for small environments load balancing might
become a bottle neck especially if you are using 1 Gb/s ports. Though it is
rare to fully utilize network performance of 10Gb/s ports in small
environments, but tuning is better to do at the very beginning then later on
production environment. NAS protocols are file granular and since storage
system run underlying FS, it can work with files and provide more abilities for
thing provisioning, cloning, self-service operations and backup in many ways
more agile then SAN. NAS protocols evolving and absorb abilities from other
protocols, to be particular, SAN protocols like FC & iSCSI, to fully
diminish their disadvantages and already provide additional capabilities to
environments which can use new versions of SMB and NFS.

Trouble shooting

90% of all the problems is network configuration on the
switch side, 10% other on host side. Human error. The problem often either with proper MTU configuration, LACP or Flowcontrol.