Thursday, September 3, 2015

Manage stock Windows AMIs with Ansible (part 1)

Ever wished you could just spin up a stock Windows AMI and manage it with Ansible directly? Linux AMIs usually have SSH enabled and private key support configured at first boot, but most stock Windows images don't have WinRM configured, and the administrator passwords are randomly assigned and only accessible via APIs several minutes post-boot. People go to some pretty awful lengths to get plug-and-play Windows instances working with Ansible under AWS, but the most common solution seems to be building a derivative AMI from an instance with WinRM pre-configured and a hard-coded Administrator password. This isn't too hard to do once, but between Amazon's frequent base AMI updates, and the need to repeat the process in multiple regions, it can quickly turn into an ongoing hassle.

Enter User Data. If you're not familiar with it, you're not alone. It's a somewhat obscure option buried in the Advanced area of the AWS instance launch UI. It can be used for many different purposes; much of the AWS documentation treats it as a mega-tag that can hold up to 16k of arbitrary data, accessible only from inside the instance. Less well-known is that scripts embedded in User Data will be executed by the EC2 Config Windows service near the end of the first boot. This allows a small degree of first-boot customization on a vanilla instance, including setting up WinRM and changing the administrator password; once those two items are completed, the instance is manageable with Ansible immediately!

Scripts can be embedded in User Data by wrapping them in <powershell> or <script> tags for Windows batch scripts- in this case, we'll stick to Powershell. The following User Data script will set the local Administrator password to a known value, then download and run a script hosted in Ansible's GitHub repo to auto-configure WinRM:

A word of caution: User Data is accessible via http from inside the instance without any authentication. While the following technique will get your instances quickly accessible from Ansible, DO NOT use a sensitive password (eg, your master domain admin password), as it will be visible as long as the User Data exists, and User Data requires an instance stop/start cycle to modify. Anyone/anything inside your instance that can make an http request to an arbitrary host can see the password you set with this technique. A good practice is to have one of your first Ansible tasks against your new instance change the password to a different value. Another thing to keep in mind is that the default Windows password policy is usually enabled, so the passwords you choose need to satisfy its complexity requirements.

Before we get to the Holy Grail of actually using Ansible to spin up Windows instances using this technique, let's just try it manually from the AWS Console first. Click Launch Instance, and select a Windows image, then under Configure Instance Details, expand Advanced Details at the bottom to see the User Data textbox.

Paste the script above into the textbox, then click through to Configure Security Group, and ensure that TCP ports 3389 and 5986 are open for all IPs. Continue to Review and Launch, select your private key (which doesn't make any difference now, since you know the admin password), and wait for the instance to launch. If all's well, after the instance has booted you should be able to reach RDP on port 3389, and WinRM on port 5986 with Ansible (both protocols using the Administrator password set by the script). It can often take several minutes for Windows instances set up this way to begin responding, so be patient!

Let's test this using the win_ping module with a dirt simple inventory. Create a file called hosts with the following contents:

2 comments:

I liked your statement on easy Windows provisioning on https://www.ansible.com/blog/easily-provision-windows to "bust this myth once and for all", so I read your blog. The idea to use user data for initial configuration is cool. Still I struggle to replicate your example.

I'd say the certificate error is correct and expected. Since August 2014 there is Python 2.7.9 which has fixed the long outstanding bug to not complain for self-signed certificates, see PEP-0476 https://www.python.org/dev/peps/pep-0476/. The certificate must be checked and the check will fail as the certificate is self-signed.

Since you wrote your blog in September 2015, I'm sure you used a recent Python 2.7.9+ and so I would have expected you run into the same issue. But you did not mention anything like that in your blog.

Since Ansible 2 and pywinrm 0.1.1 I know we can use "ansible_winrm_server_cert_validation: ignore" to ignore the certificate, but that is of course not the best solution.

So I still am interested to know how I can use Ansible to do fully-automated provisioning of Windows instances in AWS, without ignoring a self-signed SSL certificate.

Nothing to prevent an enterprising soul ;) from extending these same techniques to install a "real" certificate via your own UserData script (based on ours). Self-signed certs are pretty common in Windows-land, though (I'm looking at you, RDP), and I wanted to demonstrate the basic technique without all the distraction of managing a CA or dealing with externally-generated certs. Still, there's no technical reason that couldn't all be automated as well, it's just that there are myriad ways to do so, depending on where your certs come from.

Another lower-friction option would be to sample the individual self-signed certs after the machines are up and drop them on the control machine as trusted, though you'd have to trust your initial channel not to be MITM'd...

Lots of ways to take this basic technique to the next level, anyway- I still stand by my initial claim. :)