Friday, September 12, 2014

Through a joint effort with Hadar Freehling, one of my esteemed peers here at VMware, we co-developed a proof-of-concept workflow for a network security use case. Hadar created a short video showing and explaining the use case, but in summary this is a workflow that reacts to and remediates a security issue flagged by third-party integration with NSX (in the video, TrendMicro is used but it could be any other partner integration with vShield Endpoint). Basically, here's what happens:

a virus is detected on a VM and is quarantined by the AV solution

the AV solution tags the VM with an NSX security tag

NSX places the VM in a new Security Group, whose network policies steer all VM traffic through an IPS

vCenter Orchestrator monitors the security group for changes and when a VM is added

a snapshot of the VM is taken for forensic purposes

a vSpan session (RSPAN) is set up on the Distributed Virtual Switch to begin capturing inbound/outbound traffic on the VM

once the VM has been removed from the security group, the vSpan session is removed

Watch the video below for a walk-through by Hadar:

You will note that there is a portion of the workflow that is handled natively by NSX (Security Tag reaction, Security Group policy) but the snapshot and RSPAN are done via vCO workflow.

If you are interested in exploring this capability, I have provided the vCO workflow package for download. This is provided as-is and you should fully test it (and modify as needed) before using in your environment.

To use the workflow package you will need (assuming you also have NSX, vShield Endpoint and some third party integration already set up):

The workflow package includes a good number of "helper" workflows which you will not need to run directly. The master workflow is in the root folder Security Reaction and is named "Set up VM Forensics RUN THIS" (just in case you had any doubt as to which one to run).

The Security Reaction Master Workflow

Running the master workflow will prompt you for three items -

- The NSX Security Group to monitor. This is why the NSX plugin is required, so that you can browse the vCO managed objects and locate the desired Security Group.
- A time to sleep in seconds. The master workflow will run continuously until manually stopped and will use a REST call to NSX to get the current membership for the Security Group. We have no recommendation on this poll time, although in testing we used 5-10 seconds. It would have been better to use some external event to kick off the vCO workflow but we could not find a way to do this from NSX. It may be possible to do via the partner solution, but we wanted this workflow package to be "partner neutral."
- Destination IPv4 address. This is the destination for the RSPAN (or vSpan session in vSphere API terms). The vSpan session is created with some defaults (for example sampling rate, normal traffic allowed, etc). If you want to change any of those properties, you will need to modify the Helper workflow named "Configure encapRemoteMirrorSource vSpan Session on DVS" (modify the "Create Port Mirror" script task).

Also note that this workflow doesn't support VMs with multiple vNICs - specifically, it will only create an RSPAN that includes the first vNIC found on a VM. You can modify the Helper workflow "Implement Forensics" and adjust the script task "Prep for Mirror Creation" so that the additional NICs (if any) are added to the sourcePorts array. It's something we intended to fix but forgot about until after our final testing and video production - so as they say in the textbooks "this is left as an exercise for the reader."

Of course, there are many other actions that can be taken besides setting up an RSPAN and getting a snapshot. This solution can be extended to practically any task required during such an event such as creating a ticket in your service desk software, spinning up additional workloads to replace the compromised VM, sending emails, guest OS file system operations... all of these and more can be accomplished using vCO in conjunction with NSX.

Hadar Freehling - @dfudsecurity - is a Security and Compliance Systems Engineer Specialist with VMware and jointly contributed to this solution and blog post.

Thursday, September 11, 2014

If you are using Windows Server 2012 or later for your IaaS install it is recommended that you disable TLS1.2 on the IIS server. From the vCAC 6.1 install guide (IaaS Windows Server Requirements):

For certificates
using SHA512, TLS1.2 disabled on Windows 2012 machines

I have found that if you use self-signed certificates, you will absolutely need to follow this requirement - otherwise you will have deployments that utilize the Guest Agent stuck at "CustomizeOS" state and never finish deployment. The Guest Agent start up script uses OpenSSL to grab the IaaS server certificate and this fails for self-signed certs over TLS1.2.

The security protocol settings are available in the registry only. Fortunately, you can use this handy utility to manage your protocol settings on IIS instead of hunting through the registry. Or, if you like, refer to Microsoft KB 245030 for the officially supported method. Essentially, both will change the reg key as shown below....

Monday, September 1, 2014

Installing the newest version of vCAC in a lab, I ran into an issue I thought only I would encounter - turns out a peer ran into the very same issue a couple of days later so I thought I would post the problem and solution.

In our case we were both installing an new (as yet unreleased) version of vCAC with a separate SQL server for the IaaS database component. The IaaS Windows server and SQL server were cloned from the same base image. By the way, this issue isn't related to vCAC or a particular version - you could really see this with other products. It's a known issue with MSDTC and VM clones.

The installation of the IaaS component goes fine, you can even configure the tenant, add fabric groups, vSphere end points, business groups - but then things get weird. You will likely see that in your vSphere reservation, the memory, storage and network are basically empty - like nothing has been collected. In fact, if you go and look at the collection status for the compute resource you will see that the Inventory and State collections are not even showing up as configured (neither "on" nor "off").

Finally, you will see these type of entries in the IaaS vCAC server log (also can see this in the vCAC UI under Infrastructure > Monitoring > Log) -

What has happened is due to the clone of the VM for both IaaS and SQL. If MSDTC has already been installed, then both VMs will have the same GUID for their MSDTC nodes and the communication will fail. This assumes you don't have other issues such as firewall configuration problems between the two VMs.

To correct this, simply uninstall and re-install MSDTC on one of the VMs (I did this on the IaaS server) and restart the affected service (for example vCAC Server service on IaaS or SQL Server). From an elevated command prompt:

msdtc -uninstall

msdtc -install

Re-configure the MSDTC Security settings as you would for the IaaS install.

That should allow collections to run and your reservations will reflect the correct memory, storage and networking information.

UPDATE - you will need to make sure that MSDTC is configured on both the IaaS server and the SQL server for a distributed install. (Thanks to Steve Kaplan for pointing this out)