VMWare NSX Detailed Design Guide for Secured Production and DMZ use case

VMWare provides great validated designs for SDDC deployments. However, most of the organizations looks for tailored designs following the standards and best practices laid by VMWare. We see a great need for VMWare NSX designs for SMB and medium size organizations who are willing to adapt to SDDC. This post is made to address a common design of hosting production and DMZ workloads separate while making use of all SDDC features which NSX would offer.

Due to lack of time people don’t like to read the 100’s of pages of design guides which i would really recommend to read which are available here. This post is made to get a complete view of an SDDC and its requirements with detailed physical and connectivity designs. Please note to make things simple i am talking about one site only in this design.

This design can be used as a Low level design for SDDC to save your time and efforts.

Network Virtualization Architecture

This is the high level network logical design with one cluster for shared production and NSX components and another cluster for DMZ. Don’t be scared by looking at this. Have a look at all the design diagrams and decisions to get the complete view.

NSX control plane:The control plane handles network virtualization control messages. Control messages are used to set up networking attributes on NSX logical switch instances, and to configure and manage disaster recovery and distributed firewall components on each ESXi host. Carry over control plane communication on secure physical networks (VLANs) that are isolated from the transport networks used for the data plane. NSX management plane:The network virtualization orchestration occurs in the management plane. In this layer, cloud management platforms such as vRealize Automation can request, consume, and destroy networking resources for virtual workloads. The cloud management platform directs requests to vCenter Server to create and manage virtual machines, and to NSX Manager to consume networking resources.

NSX for vSphere Requirements

Below are the components and its compute requirements.

Server Component

Quantity

Location

CPU

RAM

Storage

Platform service Controllers

2

Production-Mgmt Cluster

4

12

290

vCenter server with Update manager

1

Production-Mgmt Cluster

4

16

290

NSX Manager

1

Production-Mgmt Cluster

4

16

60

Controllers

3

Production-Mgmt Cluster

4

4

20

EDGE Gateway for Production

4

Production-Mgmt Cluster

2

2

512 MB

Production DLR Control VM (A/S)

2

Production-Mgmt Cluster

1

512 MB

512 MB

EDGE Gateway for DMZ

2

DMZ Cluster

2

2

512 MB

DMZ DLR Control VM (A/S)

2

DMZ Cluster

1

512 MB

512 MB

IP Subnets Requirements

Below vLans for Management and VTEPS will be created on the physical L3 Device in Data Center.

The upstream Layer 3 devices end each VLAN and provide default gateway functionality.

NSX doesn’t need any fancy stuff at Network level basic L2 or L3 functionalities from any hardware vendor will do.

Configure jumbo frames on all switch ports with 9000 MTU although 1600 is enough for NSX.

The management vDS uplinks for both Production and DMZ cluster can be connected to same TOR switches, but use separate vLans as shown in requirements. Only edge uplinks needs to be separate for Production and DMZ as that is what will decide the packet flow.

vCenter Design & Cluster Design

It is recommended to have One vCenter single signon domain with 2 PSC’s load balanced with NSX or external load balancer and a vCenter server will use the Load balanced VIP of PSC.

vCenter Design Considerations:

For this design only one vCenter server license is enough, but it is recommended to have separate vCenter for mgmt and NSX workload clusters if you have separate clusters.

This shared Production cluster also runs the required NSX services to enable North-South routing between the SDDC tenant virtual machines and the external network, and east-west routing inside the SDDC.

Production Cluster also hosts Compute Workload will be hosted in the same cluster for the SDDC tenant workloads.

DMZ Cluster will host DMZ workload along with DMZ edges and DLR Control VM.

VXLAN VTEP Design

The VXLAN network is used for Layer 2 logical switching across hosts, spanning multiple underlying Layer 3 domains. You configure VXLAN on a per-cluster basis, where you map each cluster that is to participate in NSX to a vSphere distributed switch (VDS). When you map a cluster to a distributed switch, each host in that cluster is enabled for logical switches. The settings chosen here will be used in creating the VMkernel interface.If you need logical routing and switching, all clusters that have NSX VIBs installed on the hosts should also have VXLAN transport parameters configured. If you plan to deploy distributed firewall only, you do not need to configure VXLAN transport parameters.When you configure VXLAN networking, you must provide a vSphere Distributed Switch, a VLAN ID, an MTU size, an IP addressing mechanism (DHCP or IP pool), and a NIC teaming policy.The MTU for each switch must be set to 1550 or higher. By default, it is set to 1600. If the vSphere distributed switch MTU size is larger than the VXLAN MTU, the vSphere Distributed Switch MTU will not be adjusted down. If it is set to a lower value, it will be adjusted to match the VXLAN MTU.

Use two VTEPS per servers at minimum which will balance the VTEP load. Some VM’s traffic will go from one , other VM’s from another one.

Separate vLans will be used for Production VTEP IP pool and DMZ VTEP IP pool.

Unicast replication model is sufficient for small and medium deployments. For large scale deployments with multiple POD’s hybrid is recommended.

No IGMP or other needs to be configured on physical world for Unicast replication model.

Production Cluster VTEP Design

As shown above each host will have two VTEP’s configured. this will be automatically configured based on the policy which is selected while configuring VTEP’s.

DMZ Cluster VTEP Design

As shown above each host will have two VTEP’s configured. this will be automatically configured based on the policy which is selected while configuring VTEP’s.

Transport Zone Design

A transport zone is used to define the scope of a VXLAN overlay network and can span one or more clusters within one vCenter Server domain. One or more transport zones can be configured in an NSX for vSphere solution. A transport zone is not meant to delineate a security boundary.

Option -01 : Two transport Zones

Two Transport Zones will be used one for Production workload and another for DMZ workload.

Production transport Zone will be having Production shared cluster.

DMZ Transport Zone will be having DMZ Cluster.

Option -02 : One transport Zones ( Recommended )

One Transport Zones will be used one for Production workload and for DMZ workload. This will help if you are planning for DR or secondary site as only One universal Transport Zone is supported, so when moved to secondary site we can have one Universal TZ and two universal DLR , one for production and one for DR.

Logical Switch Design

NSX logical switches create logically abstracted segments to which tenant virtual machines can connect. A single logical switch is mapped to a unique VXLAN segment ID and is distributed across the ESXi hypervisors within a transport zone. This logical switch configuration provides support for line-rate switching in the hypervisor without creating constraints of VLAN sprawl or spanning tree issues.

Logical Switch Names

DLR

Transport Zone

WEB Tier Logical Switch.

APP Tier Logical Switch.

DB Tier Logical Switch

Services Tier Logical Switch

Transit Logical Switch

Production DLR

Production Transport Zone

DMZ WEB Logical Switch.

DMZ Services Logical Switch

DMZ Transit Logical Switch

DMZ DLR

DMZ Transport Zone

Distributed Switch Design

vSphere Distributed Switch supports several NIC teaming options. Load-based NIC teaming supports optimal use of available bandwidth and redundancy in case of a link failure. Use two 10-GbE connections for each server in combination with a pair of top of rack switches. 802.1Q network trunks can support a small number of VLANs. For example, management, storage, VXLAN, vSphere Replication, and vSphere vMotion traffic.

Configure the MTU size to at least 9000 bytes (jumbo frames) on the physical switch ports and distributed switch port groups that support the following traffic types.

vSAN

vMotion

VXLAN

vSphere Replication

NFS

Two types of QoS configuration are supported in the physical switching infrastructure.

A vSphere Distributed Switch supports both CoS and DSCP marking. Users can mark the traffic based on the traffic type or packet classification.When the virtual machines are connected to the VXLAN-based logical switches or networks, the QoS values from the internal packet headers are copied to the VXLAN-encapsulated header. This enables the external physical network to prioritize the traffic based on the tags on the external header.

Physical Production vDS Design

Production Cluster will have 3 vDS. Detailed Port group information will be given below.

vDS-DMZ-EDGE : will be used for EDGE Uplinks for North South Traffic. ( if you don’t have extra 10GB NIC’s you can use prod vds for edge port groups also, but there will be performance impact)

Most of the cases DMZ traffic will be less so you can use EDGE uplinks with 1G ports also, but again it depends on the workloads deployed in DMZ.

Port Group Design Decisions:

vDS-MGMT-DMZ

Port Group Name

LB Policy

Uplinks

MTU

ESXi Mgmt

Route based on physical NIC load

vmnic0, vmnic1

1500 (default)

Management

Route based on physical NIC load

vmnic0, vmnic1

1500 (default)

vMotion

Route based on physical NIC load

vmnic0, vmnic1

9000

VTEP

Route based on SRC-ID

vmnic0, vmnic1

9000

vDS-vSAN

Port Group Name

LB Policy

Uplinks

MTU

PROD-vSAN

Route based on physical NIC load

vmnic2, vmnic3

9000

DMZ-vSAN

Route based on physical NIC load

vmnic2, vmnic3

9000

vDS-DMZ-EDGE

The No of port groups in DMZ depends on the next hop L3 device. If we have a firewall we can use only one port group as firewalls always work as active passive which is the case we find most of the time. If you have separate L3 device than firewall for DMZ. you will have two uplinks as in Production.

Port Group Name

LB Policy

Uplinks

Remarks

ESG-Uplink-1-vlan-xx

Route based on originating virtual port

vmnic4

1500 (default)

Control Pane and Routing Design

The control plane decouples NSX for vSphere from the physical network and handles the broadcast, unknown unicast, and multicast (BUM) traffic within the logical switches. The control plane is on top of the transport zone and is inherited by all logical switches that are created within it.

DLRs are limited to 1,000 logical interfaces. If that limit is reached, you must deploy a new DLR.

Designated Instance:The designated instance is responsible for resolving ARP on a VLAN LIF. There is one designated instance per VLAN LIF. The selection of an ESXi host as a designated instance is performed automatically by the NSX Controller cluster and that information is pushed to all other ESXi hosts. Any ARP requests sent by the distributed logical router on the same subnet are handled by the same ESXi host. In case of an ESXi host failure, the controller selects a new ESXi host as the designated instance and makes that information available to the other ESXi hosts. User World Agent:User World Agent (UWA) is a TCP and SSL client that enables communication between the ESXi hosts and NSX Controller nodes, and the retrieval of information from NSX Manager through interaction with the message bus agent. Edge Services Gateway :While the DLR provides VM-to-VM or east-west routing, the NSX Edge services gateway provides north-south connectivity, by peering with upstream top of rack switches, thereby enabling tenants toaccess public networks.

Some Important Design Considerations for EDGE and DLR.

ESGs that provide ECMP services, which require the firewall to be disabled.

Deploy a minimum of two NSX Edge services gateways (ESGs) in an ECMP configuration for North-South routing

Create one or more static routes on ECMP enabled edges for subnets behind the UDLR and DLR with a higher admin cost than the dynamically learned routes.

Hint: If any new subnets are added behind the UDLR or DLR the routes must be updated on the ECMP edges.

Graceful Restart maintains the forwarding table which in turn will forward packets to a down neighbor even after the BGP/OSPF timers have expired causing loss of traffic.

FIX: Disable Graceful Restart on all ECMP Edges.

Note: Graceful restart should be selected on DLR Control VM as it will help maintain data path even control VM is down. please note DLR control VM is not in Data Path, But EDGE will sit in Data path.

If the active Logical Router control virtual machine and an ECMP edge reside on the same host and that host fails, a dead path in the routing table appears until the standby Logical Router control virtual machine starts its routing process and updates the routing tables.

FIX: To avoid this situation create anti-affinity rules and make sure you have enough Hosts to tolerate failures for active / passivce control VM.

Production Routing Design

Below are the design details.

DLR will act as gateway for Production web, app and DB tier VXLAN’s.

DLR will peer with EDGE gateways with OSPF , normal area ID 10.

IP 2 will use as packet forwarding address and protocol address 3 will be in use for route peering with edge in the DLR.

All 4 edges will be configured with ECMP so that they all will pass the traffic to upstream router and downstream DLR.

Two SVI’s will be configured on TOR / Nearest L3 device as in my case both are acting as active with VPC and HSRP configured across both the switches.

EDGE gateways will have two uplinks each towards each SVI from each vLan.

Static route will be created on EDGE for subnets hosted on DLR with higher admin distance. This will save if any issues with control VM.

DMZ Routing Design

Below are the design details.

DLR will act as gateway for DMZ web and services tier VXLAN’s.

DLR will peer with EDGE gateways with OSPF , normal area ID 20. ( note all areas in OSPF should connect to area 0)

IP 2 will use as packet forwarding address and protocol address 3 will be in use for route peering with edge in the DLR.

All 2 edges will be configured with ECMP so that they all will pass the traffic to upstream firewall and downstream DLR.

As firewalls can act as active passive only one virtual IP will be configured so only one vLan will be used.

EDGE gateways will have one uplinks connecting to firewall.

Edge Uplink Design

Below are the design details:

Each edge will have two uplinks one from each port group.

each uplink port group will have only one physical uplink configured. No passive uplinks.

Each uplink port group will be tagged with separate vLan.

Note: DMZ will have similar use case but only one port group.

Packet Walk Through

Note that as Production and DMZ are in different transport zone, packet has to exit from DMZ and route over the physical network to reach production VM’s.

Step 5: DMZ firewall will forward it to the datacenter core then to TOR switch

Step 6: L3 device pairing with EDGE will forward to EDGE, which will forward to DLR

Step 7: DLR acting as gateway for production VM, will forward the packet to VM.

Step 8: Internal VM will receive the packet from DMZ server.

Micro Segmentation Design

The NSX Distributed Firewall is used to protect all management applications attached to application virtual networks. To secure the SDDC, only other solutions in the SDDC and approved administration IPs can directly communicate with individual components.

NSX micro segmentation will help manage all the firewall policies from single pane.