Pages

2009-12-30

*****Update June 11, 2012******

*****************************

Luc Dekens brought to my attention last night that with the Visio Stencils that I posted on VIOPS a while back has somehow been moved and even worse, the content has been modified and the stencils are no longer attached.
I have already contacted VMware to try and get the content restored, and am awaiting their reply.
In the mean time if you need the stencils - ping me via Twitter and I will provide a download link.Update: Since it seems that it is taking a while for VMware to fix up the link I am providing a temporary download link for all those who need the stencils in the interim.Part 1 & Part 2 (~30MB in total)

I started this blog - like most of us bloggers for none other than a personal reference for keeping track of things I find during my day.

Officially my first blog post is dated somewhere in November 2007, but I can say that I started seriously from November 2008.

Now I posted some new years resolutions a while back when the new Jewish year started.

I can say that from that list of things I wanted to get done I have done the following:

The Business

Continue assistance to R&D / PD with enabling them to perform their job better / faster / easier (This project is well under way)

Lab manager - start to work with these products - for better productivity (I am starting a POC for Lab Manager)

Help the business with developing a Virtual Appliance Version of our product (Not yet - but we are getting more and more interest from the customer’s to utilize Virtualization)

BCP/DR

More efficient backup methods and faster restore time (we are currently evaluating EMC Avamar for this purpose)

Storage

Utilize the latest storage technologies for

Thin provisioning

Storage de-duplication

Storage offloaded snapshots (This is coming - much sooner than I expected since we are outgrowing our current storage)

Automation

Utilize more scripting for more standardization of procedures (PowerCLI) (Have you noticed my blog lately???)

There is still a lot of work for me to do, but I think this is a good start.

I would like to thank each and every one of my 50,533 Unique visitors since January 1, 2009, that have honored me by visiting this blog. I certainly did not think that this was going to happen when I started my first blog Just over a year ago.

2009-12-28

As of vCenter 4.0 when installing a new instance of vCenter, in order to allow for Linked Mode, an instance of Microsoft ADAM is installed on Windows 2003 or AD LDS on Windows 2008. The reason for this is because in order to link your vCenter Server together there has to be some kind of hierarchy in order to allow for the communication between the servers, similar to Active Directory.

Below is a sample screenshot of what it looks like:

Ok so here comes the reason for the post and why all of this important. A customer contacted me today saying, “I cannot log into my vCenter Server - something is wrong!”. I asked what was the error message he said, “Something about an error! Please Fix it!!”. He sent me a screenshot:

Well he was right about the error message, totally no information here.

First things first I looked at the Services, to see if everything was started and lo and behold there was one that was not:

Started the Service and connectivity returned to normal.

The reason this service stopped is not exactly clear, but then I was thinking to myself, if this is such a critical service then why did it not restart, I mean if you do experience an error (for whatever reason) and it shuts the service down - I would expect it to restart.

This is the service setting for the vCenter Service

For the Webservices:

And even for Update Manager:

But the VMwareVCMSDS?

Now that seems a bit strange for a Critical service - don’t you think? It would be interesting find the logic behind not setting this service to be the same as all the other?

I am changing mine - What about you?

Update:

I would actually not be doing myself justice if I would not provide a quick Powershell way to change this value – without going into the GUI.

2009-12-26

It took me a while to understand why this was not working - it could be because I hate - actually loath -having to dig through logs because of Java and Tomcat issues, but I only have my self to blame for this one.

I am currently installing a new vCenter for my Production Environment (this is part of my MJTV series that I currently going through the process). The last time we installed - we were just starting out with VMware – and there have been a decent amount of problems that we have encountered because of lack of experience and knowledge. Therefore a new vCenter (not from scratch but that is another post entirely).

Fast forwarding a couple of years - the technology has evolved - and I have gained more knowledge. So one of the things that were never implemented correctly was an SSL Certificate for vCenter. I wanted to do this right so I started out on what and how this should be done.

Firstly – this is the official VMware reference document. Since we are a Microsoft shop with a established PKI Infrastructure I went to page 2 - Replacing Default Server Certificates with Certificates Signed by a Commercial CA.

Ok so first things first. In order to create the Certificate Signing Request (CSR) you will have to download the OpenSSL binaries from here. Since the vCenter is a 64-bit box – I got the 64-bit version. Before installing the software you will need to download and install I installed the Visual C++ 2008 Redistributables (x64) as well otherwise you will not be able to run the binaries.

I installed to it all to C:\Program Files\OpenSSL. In the bin Directory of the installation folder are the files you will work with.

First you generate an RSA key for your host.

C:\Program Files\OpenSSL\bin>openssl.exe genrsa 1024 > rui.key

A small pause here. The openssl.cfg is the configuration file for the application. I wanted to install a certificate with two different hostnames. Why you may ask? well actually it is very simple. Not all of the users will always remember to put in an Fully Qualified Domain Name when accessing the vCenter server. True they should – but it doesn’t always work that way. So i wanted the SSL certificate to be valid both for the FQDN and the short hostname - i.e. vcenter.maishsk.local and just plain vcenter. So how is this done - with a field in your certificate called subject alternative name (altName). How do you get this into your CSR - well following the great advice from this link, I added to the openssl.cfg file in the [req] section

-------------- Make note of this part - because this is where it went wrong.------------------------
I pressed Enter twice - because I did not enter a password before when creating the CSR

I stopped both the VMware VirtualCenter Management Webservices and the VMware VirtualCenter Server. I then copied the three files rui.crt, rui.key, and rui.pfx to the default SSL location which is, according to the official Whitepaper from VMware:

Unfotunately - that is not 100% accurate. The correct path should be:

C:\ProgramData\VMware\VMware VirtualCenter\SSL

If you try to access the path in the Whitepaper you will get a nice

So I backed up the old SSL Certificate in the folder and pasted my new files.

Now after searching for the topic on VMware forums and anything connected with VMware and coming up with a complete blank I widened the search to a pure Tomcat and Java issue and came up with Unable to import openssl key to java keystore and this post which were suggesting that the rui.pfx that I created earlier should have been password protected.

Now looking at C:\Program Files (x86)\VMware\Infrastructure\tomcat\conf\server.xml brought me to find this configuration

<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true"

maxThreads="150" scheme="https" secure="false"

clientAuth="false" sslProtocol="TLS"

keystoreFile="C:\ProgramData\VMware\VMware VirtualCenter\SSL\rui.pfx"

keystorePass="testpassword" keystoreType="PKCS12"

Now where had I seen that string before?? hmmm…

(And this is where things went wrong). I had started to follow a walkthrough that was posted on the VMTN, and on this page there was no mention of what the password was supposed to be. So I naturally pressed Enter - Twice - and continued.

Remember I said above

-------------- Make note of this part - because this is where it went wrong.------------------------
If I had read the White paper carefully - it explicitly states

So pushing Enter twice - was not a good idea after all, I should have entered the password as above when prompted or entered the command like above.

A quick change of the rui.pfx. Stop Services. Copy new file. Start Services.

2009-12-23

I got a call today from a colleague that had an issue with an ESX Server that was behind a firewall that kept on disconnecting every 30 seconds, and he could not understand why.

I remember that I had encountered this before, and the solution this was happening because not all ports in the firewall were open to allow the traffic through, and there the Host was losing connection.

So as a reference for myself (and anyone else that can use it) here is what needs to be opened.

2009-12-20

I was asked to update an attribute of the EmployeeNumber for each and every user in the Enterprise for a new Application that will be using the newly populated attribute for a Global Database application.

I had several examples that I could use for the job utilizing VbScript – but I wanted to use Powershell for the task.

It turned out to be a relatively easy task – using the Quest Active Directory Commandlets.

2009-12-06

Firstly, you might ask what is MJTV1? I was thinking that I would like to tag all my posts for this series with something to be easy to recognize it by. So no it is not Michael Jackson TV 1, but rather My Journey To VSphere. I started two weeks ago with this post.

So let us start with Part 1. Today I will discuss the topic: Benefits and Justification

As you might all know the actual percentage of the technical part of a successful project is is less than 20%. I believe that a project will succeed/fail mainly based how well the project was planned documented, thought through, risks identified, risks mitigated and only in the end - technical implementation.

So what goes into planning your project?

First and foremost - you will have to identify (and of course "sell" to Management) why do you need to upgrade? We all know the saying "if it ain't broke then don't fix it!" I personally do not really believe in that because technology evolves, all the time, things get better, faster and cheaper which makes this kind of logic not always the best option. I mean driving around in a Mustang 69 is a really cool thing - I mean you get the babes, you get to go from A->B and in general this car fills it purpose.Comparing it to the new Hybrid cars today - you still get the babes - get from A->B and it also fills it purpose. But looking at the bigger picture…

Your Mustang needs:

frequent repairs and tune-ups

more gas

a new coat of paint

doesn't drive as fast

emits more fumes

parts are harder to come by

but it still stays a cool car!

no air-conditioning

Your Hybrid needs:

less gas - because it use alternative energy

less repairs (it is a new car)

zooms like the wind

easier on the environment

can be a cool car - depends on who is driving it

it has air-conditioning

I think you see where I am going here. Let us take it to the upgrade to vSphere. The question you will and must ask yourself is - "What are the benefits I will receive with the upgrade?" Now of course I can list a number of benefits here - you can get them from VMware's site or from your own personal environment. Also you can add to the list - what are the problems / issues I am currently experiencing in my environment and does this upgrade solve/ease them?

From the list you must identify - what are you pain-points that you are currently suffering from and how this upgrade can ease / solve them.

For me - these are the some of the Benefits I will emphasize for upgrading to vSphere 4 U1.

VDR and vStorage API's - Better options for backup compared to 3.5

Large performance improvement on the same hardware.

Larger Resources available for each Virtual Machine

Host Profiles and dvSwitch - will save a huge amount of time in the enterprise with configuration of ESX hosts.

Higher density on each ESX host.

If you cannot come to a good list of why you should upgrade and what are the problems that this Upgrade will solve FOR THE BUSINESS then you should not be doing it. I do not agree with those who upgrade to a new version just because it came out. That is why so many of Microsoft's Enterprise customers told Microsoft to take Vista, and shove it. We did not deploy Vista - there is no reason and never was any reason to do so. Now that Windows 7 is now out and the benefits in the new OS include a large number of Enterprise features that we can use - there is more of a reason to roll out the new OS. I want to emphasize one more thing here - and it was written here in Capital letters a few lines above - the upgrade could make things easier for you as an Admin - they could save you time - you could do your job better - but if there is no real benefit for your Business - then you will have a hard time selling the justification to Management.

Of course this will all have to go down on paper / presentation for the right people, in order to get the go-ahead for your project.

You of course have to keep in mind What the process of the Upgrade will be but that is for another post.

On average these hosts are utilizing 50-70% RAM and 20-30% CPU. The machines run without any noticeable issues.

Next - the incident.

Along comes 05.00 - alerts start coming in from the monitoring system - timeouts from the monitoring agents were occurring - i,e, it looked like the virtual machines were not responding. Within 20 minutes things were back to normal.

8.00 Same thing happens again. While this was happening I tested the guest machines for connectivity - all 100%. Tried to log into a virtual machine with RDP - Slow as a snail. It took approximately 3 minutes from CTRL+ALT+DELETE till I got the desktop, and again all this time - network connectivity to the VM was100%.

I started to receive more complaints of the same issue across a number of Virtual Machines. First thing I did was to try to find what (if anything) was in common. The machines were spread around over different hosts, in different clusters on different VLAN's, so that was not it.

The only thing that was common amongst all the Virtual machines - they were all using NFS datastores - divided over 3 different Datamovers on the same EMC Storage array.

The outages were intermittent and not permanent.

In the meantime I opened a priority 1 SR with VMware support. Support was back to me within 30 minutes (according to the Severity Definitions) so they we right on time.

Logs were collected.

Tests were performed to test network issues with the ESX hosts.

In the meantime - we tried to see if anything was wrong with the network infrastructure - no issues at all. Throughput on the ports using the NFS datastores was well be low normal, Virtual machines Network was also not suffering under any kind of load.

Again all fingers were pointing to the Storage Array.

There was a slight amount of stress on the storage array - this we found with the help of EMC (who also got a priority 1 call the same time as VMware) but nothing to be highly worried about.

OK - so how do you measure NFS throughput on the ESX side? Unfortunately this is not so simple. On the contrary to measuring disk throughput with iSCSI / SAN which can be done relatively easily with the performance charts / ESXTOP - there are no metrics for disk performance when it comes to NFS datastores. The only thing you can check is vmkernel throughput.

Using ESXTOP -> n:ESX nic -> T to Sort by megabits tx ( I truncated the data a bit to make it presentable)

The bold entry is the VMkernel interface and what its network traffic is. Now the utilization of this port was never getting over 2-4 Mb/s - which is nothing.

In the meantime we started to receive more complaints about regular NFS mounts (not connected to our Virtual Infrastructure) that were performing slowly - in addition other servers that were connected directly to the SAN as well were suffering.

Again all pointed to the storage.

One more thing.

NFS (like iSCSI) uses the vmkernel - so where would you look for issues if that were the case? If you said /var/log/vmkernel - you were right!

From the log - during these outages entries similar to this were present

No connection? No connection? Datastore not responding - Storage anyone?

After putting 2+2 together - and getting a big headache - we all knew it was a storage issue.

Sat on EMC's head to solve it.

They did. What it turned out to be was an application that was connected to a LUN on the storage array (not my LUN) that had malfunctioned - and was using its LUN with 100% utilization over 90% of the time.

Why this affected the rest of the storage - we will hear back from EMC after completing the root cause analysis on the issue. But as soon at the rogue application was stopped - like magic all returned to normal. Measures have been taken to alert us of such issues on the storage array in future

So what did I learn from this experience?

Why were the machines still responding - even though the storage was not working properly? My theory on this is as follows. Network was working fine. The machines responded slowly - when you tried to login. What happens when you login? You load up a user profile - which is on the vmdk - which in turn was on the NFS share - which was as slow as a snail. Therefore it was logical that this was the issue, because of a badly performing disk.

NFS throughput is not something that VMware can present easily to the administrator for troubleshooting. There are no disk counters for VM's on an NFS datastore. Disk Performance on the ESX does not include NFS traffic. This I find is something that VMware has to improve on - since more and more shops are starting to use NFS by default. If they provide the statistics for iSCSI / Fiber - then there is no reason they should not do it for NFS.

An assumption was made that the Storage Array was most probably the least likely to fail out of all the chain of components in the virtual Infrastructure. In the ESX Server - we have 2 Disks / 2 CPU's / 2 Power supplies / at least 2 NIC's - all to protect from a single point of failure The Network cards are connected redundantly to the Network Infrastructure - to protect from a single point of failure. The ESX Servers were connected to the storage array to 3 different Datamovers - to protect from a single point of failure. But all in all the storage was the point of failure here.

The storage is shared with other applications and not dedicated to Virtualization - this has its ups and downs.

So now is all calm and well - and now I can start up solitaire on my Windows servers within a few seconds from the the time I press CTRL+ALT+DEL - so I am happy :)

What I do like about instances like these - is things that should not / cannot happen (in theory) actually do (in reality)- and when they do, it is a great learning experience, which only makes me want to improve and provide even a higher level of performance / availability.

2009-11-30

Trend Micro have now joined Altor Networks and Reflex Systems with their new offering that utilizes VMware's VMsafe technology. I expect we will be seeing more and more of the Security Companies release their products that will utilize VMsafe

Deep Security protects confidential data and critical applications to help prevent data breaches and ensure business continuity, while enabling compliance with important standards and regulations such as PCI, FISMA and HIPAA. Whether implemented as software, virtual appliance, or in a hybrid approach, this solution equips enterprises to identify suspicious activity and behavior, and take proactive or preventive measures to ensure the security of the datacenter.

VMware integration with VMware vCenter and ESX Server enables organizational and operational information to be imported into Deep Security Manager, and detailed security to be applied to an enterprise’s VMware infrastructure

2009-11-26

Two weeks ago I wrote an article about How To Bring Down A Single NIC In ESX?. In that post you could see that in order to test this you had to go into the console of the ESX and run the commands on the console.

Already then I was thinking, why not do this from PowerCLI, without having to log into each host.