Pages

2013-01-28

Today I attended the DevOpsCon 2013 in Hertzliya, Israel. As you can see from the agenda there was an abundance of great sessions and I thoroughly enjoyed myself.

This was not like most of the one-day seminars I usually go to – it was focused 100% on DevOps – and virtualization was just a (very small) by-product – seeing that it was only the underlying infrastructure.

The speakers were very good (almost all of them), the content was very interesting and very though provoking. The food was fine, venue was pretty much ok (albeit a bit crowded).

I had a really good time, it was great to see the interactions on Twitter and the feedback that was given during the sessions (even though I expected a lot more – it seems that Israel is not that big on Twitter).

I will be going into depth with my thoughts on some of the sessions in future blog posts but here are 8 little gems (of many) that came out of the day.

Microsoft are hosting the day. As soon as they understood that most people are Linux guys, the AC is arctic. Penguin is frozen #devopscon

Interesting question - how does DevOps deal with Database changes? #devopsconThere was no concrete answer!

He's got Linux emulation on his Windows desktop, used to push code to his Linux VMs running on Azure #Aabomination #devopscon

It is sooo obvious from this demo that Microsoft is so not in the #devops game #devopscon

Culture is one of the biggest challenges #devopscon

There are only two log levels: a. Too much bullshit b. I'm f***'ing blind!! #devopscon

Human driven auto-scaling - is not realy the right way to do it #devopscon

There is no such a thing as hard-coded dynamic IP's #devopscon

I really appreciate the feedback!

@maishsk no thank YOU for the tweeting & sharing, you helped make it awesome. PLUS we thoroughly enjoyed your sense of humor :-) #devopscon

I encountered this last week and was not find any reference to my specific problem – so I am documenting it here.
I was trying to remove the Cisco Nexus 1000V VEM from the ESXi hosts in my lab.
This was the error I was getting.
This is what I had from the esxupdate.log file

They were not relevant to my versions – neither of ESXi nor the Cisco modules but still this led to the right solution.

I checked to see if I had any vem* processes still running.
After killing the processes.
The removal was successful.

The vem-swiscsi process was not killed properly when I stopped the vem (or removed the modules) – which I assume is a bug which was re-introduced since 4.2(1)SV1(5.1).

The Release Notes for Release 4.2(1)SV1(5.1) say that these bugs were resolved 17. CSCtl21012 The vem-swiscsi process fails to exit when no "Software iSCSI" device is found.44. CSCtr83664 The vem-swiscsi process fails to exit when no "Software iSCSI" device is found.

In short – if you cannot remove the Cisco VEM from a ESXi host – check that there are no vem processes still running – that will prevent you removing the module.

2013-01-24

SSH Equivalence is one of the pre-requisites needed for an Oracle RAC installation.
There are a number of posts on how to do this like here or here, and Oracle even have been so kind as to provide a script that will do this for you (even though it is not 100% automated.
The process is relatively simple (when you break it down piece by piece)

Create the .ssh directory under the users /home folder for VM1 and VM2

Create an RSA key on VM1 and VM2

Copy the contents of ~/.ssh/id_rsa.pub from VM1 and VM2 into ~/.ssh/authorized_keys on both VM1 and VM2

You should then be able to connect to each host (and also the localhost as well) without a password prompt.

Repeat the process on both VM’s with the oracle user

But this process requires a decent amount of manual interaction from the user at the following stages:

Copying the files between VM1 <-> VM2

First connection prompts to add the hosts key to the ~/.ssh/known_hosts file

Manual interaction is the mother of all headaches when you want to automate something. As I have posted before here and here I am in the middle of automating a Oracle RAC deployment on VMware. This is an additional part of the solution.
I had to come up with a method to do this without any user interaction, and here is how I went about the process. I broke down the whole process – stage by stage.

Re-create the ssh_host_rsa_key – the reason for this being – that since these VM’s are deployed from the same template – the ssh_host_rsa_keyis identical – and this caused problems for my script (this actually could be useful in some cases – but not here).

Create the ~/.ssh/id_rsa.pub key for the root user on each host – without prompts.

In order to prevent the popup when connecting to another VM for the first time I needed to get the keys from ssh_host_rsa_key.pub into the .ssh/known_hostsbefore I connected to the VM for the first time.

Add the public key from each VM into the ~/.ssh/authorized_keys file.

Get this information from VM1 to VM2 and and vice-versa – and all of this without prompts – which meant I could not go through the guest operating system.

Repeat the process for the oracle user.

So my initial challenge was how to do the copying of the files without going through the guest OS, but that actually turned out to be pretty simple. PowerCLI has the Copy-VMGuestFile cmdlet that will allow me to transfer files to and from the guest – so that solved my worries.
There were several issues along the way that I needed to address.

I needed to construct the known_hosts file based on several pieces of information, the hostname, IP address and the ssh_host_rsa_key, I am sure there is a easier way of doing this in bash – but this way works for me.

Creating the rsa keys for the oracle user – since I did not want to connect to the VM twice with two different credentials – here I solved the problem by duplicating the files from the root user to the oracle user and manipulated the contents a bit to suit my needs.

Copying the files back to the guest after manipulation – resulted in a change in their format from UNIX to DOS and I could not find a way to control that from the PowerCLI side – therefore some vi manipulation was needed to convert them back.

So without further ado – here is the script – annotations are at the bottom

Lines 45-57 - The script requires some parameters (two are mandatory). The name of the VM’s that will be configured, their IP addresses, and if you would like to not remove the files created during the process, you should change the $Cleanup variable to $false (by default $true). Also in order to run scripts on the guests you will need to provide credentials for the hosts and the guests (I am assuming that all hosts have one password and also the guests have one password as well).Lines 59-70 - If the credentials were not provided as variables – then you will be prompted. If the IP’s were not provided, they will be retrieved through the API. Lines 72-106 - The script that should be run on the guests. There is one for each VM – due to the fact that the IP is (of course) different on each of them.

Lines 77-78 – Create the .ssh directory and create the keys. –N is to set a blank password on the key and –f is for the path.

Lines 79-83 - The known_hosts file is basically a concatenation of 3 things for each entry:<hostname>, <IP_Address> <Contents of rsa_key.pub> (The commas and spaces are important!)

Line 84 - Add the contents of id_rsa.pub to the authorized_keys file.

Lines 85-87 – Copy the files into the oracle user’s directory and make sure sure the file ownership is correct.

Lines 108-109 – Run the scripts on each VM.

Lines 112-115 – Copy the files to the local computer for text manipulation.

Lines 117-118 – The authorized_keys are per user, and the ones we created for the oracle user were copies of those from the root user, so the username has to be changed.

Line 121 – Combine all 4 authorized_keys files into one, with carriage returns after each one.

Lines 123-126 – Copy the files back to the guests. And as I said above, the files needed some additional vi manipulation because during the copy back – they file type was incorrect.

Lines 129-137 – The same process for the known_hosts file. Take note – only one copy from each guest was needed, that is because it is VM specific and not user specific.The same vi manipulation as well.

Lines 140-149 – The process is repeated to place the files in the oracle user’s home directory.

2013-01-23

First let me start off this by saying – the way this is effects you will differ entirely on your organizational procedures and security requirements.

We all love templates – don’t we? I mean they are the best! You configure your VM to your liking, OS patches, company policy settings etc.. etc.. and every new VM that you deploy – will have the exact same baseline.

Standardization… conformity… in the enterprise – all great.

Except.. a short while a go I found out something which is not exactly the best security practice (to put it mildly)

So you have deployed a VM from a template – it now has an IP – and what is the first thing you would most probably do? SSH into the VM – because you now want to start doing the real work (amazing how we take the deployment of an OS for granted these days).

So if this is the first time you are connecting you will most probably get something like this:

Which to explain in simple terms is saying, “Hey – I don’t know this server – here are its details and RSA key. Do you still want to connect?”

And you would usually say – yes and enter your password – and all is fine and dandy.

What this does is add an entry to the .ssh/known_hosts file

But not only did I deploy one VM from this template – I deployed 2. So let’s repeat the process again.

So where is the problem? If you look carefully – you will see that key fingerprint on vm1 is the same as on vm2

But why is that? Shouldn’t every VM be unique? I mean they have different MAC addresses, different UUID’s, different IP’s – VMware usually takes care of that.

Well it is pretty simple really – when SSH is installed, the OS package usually creates these files for you. But remember we are cloning from a template – after SSH was installed (that will usually be the case).

That now means – that everysingle VM that was deployed from the template now has the same exactly identical key.

That could be acceptable in your environment – maybe. Maybe not.

But take this example. You are providing vCloud services and your VM’s are spawned from the same templates. All … of… them!!!

Here you could have the same public keys in different organizations – different companies, I am sure you can see how bad this might become from a security perspective.

This can also cause havoc on certain monitoring systems and also will create a number of problems with SSH key authorization.

So how do you solve this? Unfortunately – there is no built-in way to do this with the current functionality in vSphere today. PowerCLI scripts – or other orchestration tools will need to be used to get around this.

What I would personally like is an option to run a guest OS script as part of the deployment process. Yes I know this exists for Windows VM’s today – but there is no such functionality for Linux.

I did a quick check on some of the VM’s in one of the environments I have access to - 50 VM’s

There are duplicates – actually I was surprised to see that there some were actually unique.

Some food for thought… (which reminds me – time for me to go out for lunch).

At the moment my recommended solution would be to remove the ssh_host_* files on the VM before you power it down. The files will be recreated once the VM starts up (or a new VM is deployed from this template). Just make sure.. When you power on the template for maintenance – you must remove the files before your power it down again.

I studied the traffic flow – and would like to share it with you here, but first here is the architecture of the testbed – which will help explain in more detail

Well the environment is pretty simple. One vCenter server, four ESXi Servers – each with a local datastore and a shared datastore among the 4 hosts. My laptop is the one that was running the Copy-DatastoreItem cmdlet.

Copy-DatastoreItem has the following parameters

Item - Specify the datastore item you want to copy. You can use a string to provide a relative path to the item in the current provider location.

Destination - Specify the destination where you want to copy the datastore item.

But how does the file get from my laptop to the ESX3? From my laptop directly to the host? Some other way?

In order to check this I chose a file that was reasonably large (~3.0 GB) so I could see the network activity. I opened up the Resource Monitor and sorted the columns to see the network traffic and saw the following.

The network traffic was going from my laptop directly to the vCenter Server – using Powershell of course.

Next – from the vCenter – is had to get to the host of course.. – but how?

Here is the Resource monitor from the vCenter Server

On the top right what you see is that the vpxd.exe (vCenter server process) is the one that has the highest amount of network traffic.

In the bottom window you can see on the first line that it is sending out traffic to ESX3 (.173) over port 902 and on the second line it is receiving traffic from my laptop (.187) over port 443. This makes perfect sense.

PowerCLI is communicating with the vCenter – and the vCenter is sending the traffic over to the host. So if we were to look at the original architecture again , the traffic flow will look like this

Next I tried the same thing, but this time to the Shared datastore.

Based on what we had before the flow should be:

Laptop –> vCenter –> ESX host.

But the question is though – which ESXi host? The destination parameter to where I want to copy the file is as follows:

The VMware para-virtualized drivers have been updated to provide a seamless out-of-the-box experience when running Red Hat Enterprise Linux 6.4 in VMware ESX. The Anaconda installer has also been updated to list the drivers during the installation process. The following drivers have been updated:

2013-01-02

Luc Dekens wrote a great post a while back Will Invoke-VMScript work? about the prerequisites needed in order to get Invoke-VMscript to work. Stop for a minute and go and read his post.

Glad to have back.

As part of an Oracle RAC provisioning script that I am working on – one of the first things I wanted to do was to configure the network settings for my two nodes – with parameters taken from a config file.

Of course if the VM does not have an IP address yet then you cannot configure it through the network, so here is where Invoke-VMscript comes into play.

A few things first off the bat. My configuration was working also with the 32-bit engine but also with the 64-bit engine as well. The rest of the prerequisites were all there.

So here is what was happening. In the script I had stored the HostCredentials and the Guestcredentials each in a variable. When it came time power on the VM’s,The script would wait for the VMware tools to start running in the guest before executing the command and then run my script inside the guest OS – but the command would fail with this message.

Now this was really weird – because this was completely not true. To make it even more baffling – when trying to run the commands manually – not as part of the script – it would work – without a problem. So I was starting to think that perhaps there was a problem that the credentials were not being passed properly down to the command – which was not likely – but still I had no other clue.

I then wondered – Invoke-VMscript – interacts with the VMware Tools in the guest through VIX – so maybe there was a problem there.

So I checked my versions (it was 1.12) and looked at the release notes and saw something there that ultimately led me onto the right path.

OK I was not getting any of these errors, but my command would not work. So I wanted to see what the logs of the VMware Tools in the guest were saying – after all it was interacting with the guest through VMware Tools. But where was the log?

VMware Tools comes up and reports itself as running – way before the OS is actually available – and that you have a console prompt.Even before SSH starts

The first 3-5 tries of Invoke-VMscript would fail – with the same error message I had before. And suddenly it would work as if nothing was wrong.

I went to look in the VMware Tools log that I had just configured and there I found something which I find very strange – but did solve my mystery but I still do not have the answer as to why it is happening.

Now here is the weird part. If you look at the first two failures – you will see that VIX trying to execute a Windows command on a Linux operating system – which…. probably .. won’t…. really…. work…. !!!

Only about 20 seconds later – did it execute the correct bash command – in my case ‘ls –la’ and it worked of course.

So here I found my way around my problem – but have not gotten to the bottom of the mystery yet. I put in an extra sleep statement into the script that would wait a bit longer until the OS was completely up and only then run the Invoke-VMscript command – and all was working fine…

So a few things I learned today:

How to enable logging for VMware Tools

VIX does weird things.

A workaround is as good of a solution as any other.

I would add one more thing to Luc’s prerequisites – wait until the VM has completely started before attempting to use Invoke-VMscript.