About Converged and Hyper Converged Infrastructure

Recently, I have come across an issue with the PSC’s not joining to the domain (They disconnected from the domain automatically) after upgrading the vCenter components (PSC01, PSC02 and vCenter windows server) from 6.0 Update 2 build 3634791 to 6.0 Update 2a build 4632154 or to 6.0 Update 3b build 5326079. This issue occurred as the windows domain controller was 2012 R2 and SMB 2 was the communication protocol to the domain controller. we have to enable SMB 2 on the PSC’s for them to communicate to the domain after the Upgrade.

The above website clearly mentions on how to use the SUSE Linux Rescue CD to create a new root password and update it in the /etc/shadow file on the PSC itself and after reboot you will be able to get into the PSC with the new password.

Recently I came across an issue where SRM 6.1 skipped few steps during a Recovery Plan failover from Recovery site to Protected Site. I had to dig into the SRM settings to find out why and I found that I didn’t configure the Custom IP network rules on the Recovery site so the recovery plan skipped customizing IP address on the recovered VMs back in Protected Site.

here is the message as shown:

Explanation —

I have Two sites

Protected Site — NC

Recovery Site — Dallas

I have failed over from NC to Dallas fine because I put in the Network IP rules in the site NC under SRM –> Sites –> NC –> Manage –> Network Mappings, settings as shown:

As shown above, I have created the network IP Customization rule in Site_NC but forgot to do it in Site_Dallas. That is the reason why when the failback from Dallas to NC was initiated it skipped the IP customization of the VMs during the Recovery process.

NOTE: Make sure that you configure the Network IP rules on both the Protected and Recovery sites so that the IP customization is applied on the VMs at both the sites.

My colleagues have been facing this particular error recently when working on the Converged Infrastructure (Vblock, VxBlock etc) that when trying to check the FI Cluster state, it gives an error ” Peer Client db version is lower than local, self version: 3, peer version: 1″ when SSH into the FI Cluster IP address. The screen shot is as shown below:

There are at least two resolutions which have worked so far with this kind of error on the FI Cluster

Resolutions

Reboot the Peer FI (In the above case it was FI B (Subordinate)) so the database on FI B sync up with the FI A, Once FI B comes up, it will be in sync with FI A and the cluster state will be HA ready.

SSH to the cluster IP, connect local-mgmt A (whichever is the primary), then do Cluster lead B (or whichever is subordinate), This will Failover the UCS Management service from Primary to Subordinate. This is a less Impactful method than the first method.

restart pmon service on the peer FI (A/B) and this could fix the issue

Looks like VMware finally got around to have a public statement on the release of vSphere 6.5 and here are some of its main NEW features

vCenter Server Appliance

vCenter server appliance now has integrated Update Manager

vCenter server appliance now has its native High Availability

vCenter server appliance has better appliance management

vCenter server appliance now has native Backup/Restore

HTML-5 based vSphere web client

Security

VM-level disk encryption capability designed to protect against unauthorized access to data. (This is done using the vSphere storage policy framework)

Encrypted vMotion capability

vSphere 6.5 adds a secure boot to the hypervisor to protect both the hypervisor and guest operating system

Enhanced audit-quality logging capability to provide more information about user actions like who did what, when and where if you need to investigate your environment

Host Resource management

Enhanced Host profiles (updated graphical editor that is part of the vSphere Web Client now has an easy-to-use search function in addition to a new ability to mark individual configuration elements as favorites for quick access, Administrators now have the means to create a hierarchy of host profiles by taking advantage of the new ability to copy settings from one profile to one or many others)

Auto Deploy (Easier to manage in vSphere 6.5 with the introduction of a full-featured graphical interface. Administrators no longer need to use PowerCLI to create and manage deploy rules or custom ESXi images)

Proactive HA ( Proactive HA will detect hardware conditions of a host and allow you to evacuate the VMs before the issue causes an outage. Working in conjunction with participating hardware vendors, vCenter will plug into the hardware monitoring solution to receive the health status of the monitored components such as fans, memory, and power supplies. vSphere can then be configured to respond according to the failure)

vSphere HA Orchestrated Restart (vSphere 6.5 now allows creating dependency chains using VM-to-VM rules. These dependency rules are enforced if when vSphere HA is used to restart VMs from failed hosts. This is great for multi-tier applications that do not recover successfully unless they are restarted in a particular order. A common example to this is a database, app, and web server)

Additional Restart priority levels in HA (vSphere 6.5 adds two additional restart priority levels named Highest and Lowest providing five total. This provides even greater control when planning the recovery of virtual machines managed by vSphere HA)

Simplified vSphere HA Admission Control ( First major change is that the administrator simply needs to define the number of host failures to tolerate (FTT). Once the numbers of hosts are configured, vSphere HA will automatically calculate a percentage of resources to set aside by applying the “Percentage of Cluster Resources” admission control policy. As hosts are added or removed from the cluster, the percentage will be automatically recalculated, Additionally, the vSphere Web Client will issue a warning if vSphere HA detects a host failure would cause a reduction in VM performance based on the actual resource consumption, not only based on the configured reservations)

Fault Tolerance (FT) (vSphere 6.5 FT has more integration with DRS which will help make better placement decisions by ranking the hosts based on the available network bandwidth as well as recommending which datastore to place the secondary vmdk files, FT networks can now be configured to use multiple NICs to increase the overall bandwidth available for FT logging traffic)

Network-Aware DRS (DRS now considers network utilization, DRS observes the Tx and Rx rates of the connected physical uplinks and avoids placing VMs on hosts that are greater than 80% utilized. DRS will not reactively balance the hosts solely based on network utilization, rather, it will use network utilization as an additional check to determine whether the currently selected host is suitable for the VM)

I have struggled to understand the concepts of VMware NSX major functionality like NSX Edge Services Gateway, Edge Distributed Logical Router and their options, hence I am writing this series to explain in detail what I have learnt in the past few days regarding these functions in layman terms and not the marketing jargon I see on the internet regarding VMware NSX.

Part 1 — NSX Edge Services Gateway

It is very easy to create Logical switches in the NSX option using vCenter server web client

The hard part comes when you have multiple networks (with subnets) you want to activate on the edge services gateway so that they communicate to each other and to the external uplink on the edge services gateway.

Here is how I did it,

First, I created two Logical switches

App-LG

Web-LG

I have an L2 physical network switch for my home lab, so I am sure I can’t do L3 routing, in this instance I am going to show you how to create an Edge Services gateway with the proper interfaces to have multiple subnets communicating between them on a VM connected to one of these logical switches.

First, we deploy an Edge services gateway using the default options and the interfaces as shown below:

Here are the interfaces which I have configured on the gateway

In the above picture, I created one vNIC as an Uplink (I named it as External) and the IP address I gave that interface as 192.168.0.79/24 (192.168.0.0/24) is my LAN subnet in my home

Then I created two Internal interfaces (I named one as Internal) with interface IP addresses as 172.168.10.2/24 and 172.168.11.2/24 where the IP addresses 172.168.10.2 and 172.168.11.2 act as IP default gateways to the VMs attached to logical gateways App-LG and Web-LG which are connected to the two internal interfaces

Also, I configured the Default gateway in the Edge Services Gateway configuration while deploying as shown:

Now, that we have configured the L2 logical networks on the Edge Services Gateway with the interfaces, let us go to the VMs and see how the communication goes on through the logical networks

We have a test VM called Win7 connected to App-LG (which has an interface IP address as 172.168.10.2) hence the default gateway of this VM will be 172.168.10.2

Here we see the communication using ping to all the interfaces ip addresses both internal and external

In the above picture, you can see that we are able to ping the three interfaces (192.168.0.79, 172.168.10.2 and 172.168.11.2) even though the VM gateway is 172.168.10.2 since its logical gateway is App-LG.

Also, note that we weren’t able to ping my default gateway 192.168.0.1 since there is no interface or routing to 192.168.0.1 in the edge services gateway. We will cover this under routing and NSX Distributed Logical Router part next.

By this, I am concluding this part as I wanted to show you how logical networks can be used with VMS and how their networks can route between the different subnets using Edge services gateway. This is for the East-West traffic between VMS.

As of 8/18/2016 Nutanix (as far as I know), Nutanix started selling its platform (Acropolis Hypervisor and Prism Software) on Cisco UCS C-series servers. Nutanix now supports the following UCS servers —

C220-M4S

C240-M4L

C240-M4SX

Looks like the Nutanix Software bundle comes in Acropolis Pro and Ultimate Editions.

To harden your ESXi 6.0 hosts, we disable the MOB service so that any attacker can’t get to the web browser and access the MOB of the ESXi host (ex: https://esxi01.lab.com/mob), this setting will disable one of the attack vectors of theESXi hosts in the environment.

to do this, you SSH into the ESXi host where you want to disable the mob service and perform the following commands

esxi01# vim-cmd proxysvc/remove_service "/mob" "httpsWithRedirect"

to verify if the mob service has been removed from the ESXi host, use the following command

esxi01# vim-cmd proxysvc/service_list

the above command will list all the services on the ESXi host, look for the service “/mob”, if you don’t see this service, then it has been removed. if it is still there, then you will have to perform the first command and reboot the ESXi host to disable the mob service from the host.

Recently, I had to shutdown multiple ports on a Cisco MDS 9396 switch for maintenance. I had to look up the commands to do it as I haven’t done it yet as most of the customers will either shut down the switch completely or just shutdown the required ports and not multiple ports on the switch.

here are the commands to shutdown all ports on the switch at once:

MDS1# conf t

MDS1(config)# int fc1/1-40

MDS1(config-if)#shutdown

If you have to shutdown the ports and they are not in sequence, here is the command

Recently, I was working on an UCS blade firmware upgrade along with esxi upgrade from esxi 5.5 to 6.0 and came across this error where the esxi host became unresponsive with an error “can’t fork” on its DCUI.

here is a little background on this story, this particular blade was B240 blade which was being used as SAP HANA blade by the customer and the firmware upgrade and esxi upgrade went fine and two days later the host became unresponsive and we couldn’t connect to it using SSH, DCUI, etc, connecting to the kvm console revealed the below screen when we went to its Alt+F1 command interface

we had to bounce the box and we had to reduce the linux vm memory which was hosting SAP HANA on it to be 10% less than the memory of the esxi host.

Conclusion: The HANA VM (linux) on the esxi host should have 10% less memory than the overall memory of the esxi host to avoid this problem.