Troubleshoot NSX Firewall Rules for Your vRA 7 Blueprints

In Part 2 of this series, we automated 3 different NSX Micro-Segmentation scenarios with vRealize Automation 7 (vRA). In this article, I’ll show you how to do some basic Distributed Firewall troubleshooting, so you can figure out exactly which firewall rules your vRA blueprints will need.

Two Approaches for NSX Firewall Troubleshooting

First off, I should point out that there are multiple approaches to figuring out which rules in the Distributed Firewall (DFW) are causing traffic to be blocked or allowed.

One approach is to enable Logging within each firewall rule, and then monitor the DFW log on each ESXi host where the source and destination VMs reside. You can enable Logging by clicking into the rule’s Action field (where you choose to Allow or Block traffic for the rule), and then click the radio button for “Log”.

The DFW log is located on the ESXi host, in /var/log/dfwpktlogs.log. It contains the PASS or DROP status for each stream of traffic to and from each VM that is running on that host (assuming Logging has been enabled on in the corresponding firewall rules). You can SSH to the host to search the file with vi, watch it with tail, or you can scp the file over to your desktop.

Since this approach requires enabling Logging in each rule, and then watching/collecting the DFW log on each host, it can be a little tedious. I typically only use this approach when I’m troubleshooting a complex or production-scale issue, where I need to be able to see what’s going on at the vNICs of multiple VMs.

Another approach for troubleshooting firewall rules is to use NSX’s Flow Monitoring tool. Flow Monitoring is a nice GUI that lets you choose a VM/vNIC to monitor, and then shows you all of the VM’s active and blocked flows, in real-time. So there’s no need to enable any logging, or hunt down any log files! There are a couple of caveats to keep in mind though: 1) you’re limited to monitoring one VM/vNIC at a time, and 2) I’ve heard from some of my colleagues that it can be a bit laggy when monitoring a production VM with heavy traffic volume and lots of streams. When developing firewall rules for a new vRA blueprint, however, these caveats don’t really apply. This is because we’re typically working in a controlled lab or test/dev environment where traffic volume is low, and we’re only troubleshooting basic communication between the handful of VMs within the blueprint.

Flow Monitoring Example

When developing firewall rules for a vRA blueprint, I always use Flow Monitoring – I think it’s the simplest and most effective tool for the job. To illustrate how to use it, we’ll look at an example situation that I ran into while writing Part 2 of this blog series: when I tried out my first vRA blueprint with corresponding firewall rules, my request seemed to hang.

From my vRA Requests tab, I drilled into my request and clicked the Execution Information button in the top-right corner, which allowed me to see the status of all the NSX and vCenter components in my request. From there, I could see that my NSX Security Groups had provisioned OK, but the request was stuck with 2 VM’s “In progress.”

For some additional insight, I logged into vRA with an administrative account, where I was able to see that these 2 VMs were stuck in the MachineProvisioned lifecycle state, below.

Since MachineProvisioned comes after the lifecycle state in which vRA has communicated to vCenter and successfully cloned the template, I could reasonably assume that vRA-to-vCenter communication was working OK.

Next, to see what was going on from the firewall perspective I brought up the vSphere Web Client, clicked Networking & Security, and went to the Flow Monitoring tool where I added the vNIC of one of the 2 VMs that were stuck:

In Flow Monitoring, I could see that the stuck VM (192.168.110.204) was trying to do a DNS lookup (UDP 53) on the DNS Server (192.168.110.10). To fix this, I added a firewall rule to allow DNS traffic to my DNS servers. Then I went back to Flow Monitoring, where I could see that DNS traffic was now being allowed (i.e. “active”), but HTTPS was being blocked from my VM to the vRA IaaS server:

Since I know that VMs need to communicate with various vRA components during their provisioning, and all VM-to-vRA traffic is HTTPS, I went ahead and created a rule to allow HTTPS traffic to all the vRA servers in my environment. Then I took another look at Flow Monitoring, which let me know that everything was now working OK:

As you can see, no traffic was being blocked (red), and the desiring traffic was flowing correctly (DNS lookups and HTTPS traffic to my vRA IaaS and vRA Appliance).

As you may recall from Part 2, one of the first things we did was to include a group of firewall rules called Core Services, which applied to the whole environment. Core Services included both of these rules, which we’ve just seen are required for vRA Provisioning (rules 1-2, below).

Summary

To sum up, when you’re developing new vRA Blueprints that include NSX’s Micro-Segmentation features, you can use Flow Monitoring and the above process to identify any blocked traffic, and create the corresponding firewall rules to allow it.