Sensible SSH with Ansible: Overview

This is the first in a series of several posts on how to manage ssh via Ansible. It was inspired by a warning from Venafi that gained traction in the blogosphere (read: my Google feed for two weeks). I don't know many people that observe good ssh security, so my goal is to make it more accessible and (somewhat) streamlined.

The Series so Far

Code

Executive Summary

If you do any sort of remote CLI interaction, you probably use ssh already. If you don't remote anywhere, or you don't use ssh when you do, you probably don't need to read this (but it can't hurt). Like most best practices, maintaining proper ssh can become a chore as your world expands across personal boxes, work machines, online VCS, freelance gigs, and so on. Easy solutions, like using the same key everywhere for years at a time, are nice both as a consumer and as an attacker because they've been so widely documented for years. Better solutions require time and effort to update configuration everywhere, retire old keys, and maintain provenance on new keys. The best solutions automate the entire process and notify you afterward. Using Ansible, my goal is to create such a solution that can be run indefinitely from a trusted control machine. Along the way, I also plan to include and test stronger configurations, randomized options, centralized validation, and probably a good selection of rabbit holes as I figure out how to code everything.

I'm going to assume some familiarity with working in and with bash, but I also try to provide a good introduction to each component that plays an important role (Ansible pun intended). If something seems hand-wavy, there's a really good chance I'll explain it either further down the post or in a follow-up (if not, let me know). You'll need a control host capable of running bash and nodes capable of handling ssh connections. A healthy disdain for Windows isn't necessary, but it makes trying to get any of this to work in Windows a bit more palatable. I've built an example in Vagrant, but you don't need it to use any of the code presented in the series.

Note

I included all of the code (as opposed to lazy loading) for the AMP CDN, which means the remote might be ahead a bit. I'm still figuring out the balance between code blocks that are enjoyable to read and code blocks that don't look terrible. If something is completely unbearable via AMP, sorry I missed it, shoot me an email and I'll find a fix.

For the most part, you should assume a shell environment is a bash environment. I actually use zsh, so there might be some zshisms mixed in. I've tried to make this run from start to finish via /bin/bash, so let me know if I missed something. The assumed-to-be-bash shell will look like this:

$ bash --version
GNU bash, version 4.3.46(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

There's at least one PowerShell snippet. I tried to differentiate the two shells, so PowerShell looks like this:

You'll most likely need to run PowerShell as an administrator; I can't easily break out elevated commands because, unsurprisingly, Microsoft wanted to do things their own way.

Compiling the Series Posts

To get in the spirit of Ansible, I'm using a Python script to build the posts from a collection of Jinja templates. It's actually the second iteration of a build script (the first was totally bespoke). It's consumed a day of dev time, which is awesome because it's pretty neat but also not awesome because I'd like to be further in the series.

I mention this because there's a chance I missed something in the build process on any of the posts. I rigorously check builds locally before drafting them on my blog, where I read the whole thing at least twice and try to con friends into reviewing it before actually posting it. However, there's a great chance I missed something. If I did, let me know and I'll fix it ASAP.

Windows

WARNING: Microsoft wrote their own containerization API (it relies on a different kernel, after all). Unsurprisingly, most of the things you think should work don't. Chances are you'll have to use defaults everywhere, which is so ridiculously insecure that you probably shouldn't use Windows Containers.

Ansible flat out doesn't support Windows. You can use Ansible through WSL, Ansible with babun, or anything else that can run python within a bash environment on Windows. Unsurprisingly, local support (i.e. true control) is sketchy at best. Then again, the generally accepted method for managing ssh keys on Windows is PuTTy, which is so far from automatable it's not even funny.

WARNING: If you're trying to manage Windows from WSL or vice versa, you're gonna have a bad time. If you do use Ansible to configure WSL, do not expect things to work as intended outside of WSL (e.g. in PowerShell). Unsurprisingly, they will probably work until you least expect it and then you'll spend a day on StackOverflow discovering WSL makes no sense.

My Environment

I'm running Window's 10 1709, build 16299.19 build 17025.1000 (I'll try to keep this updated as I get new builds). Because of the Windows limitations, I'll only be using the created virtual environments. That means Ansible is probably the latest version (2.4.1 as of initial writing) and OpenSSH is tied to the VM's repositories (pertinent packages are under "EL-7"). That means the my only real versions come from PowerShell:

OpenSSH, the FOSS implementation common to most Linux distros, is usually split into two packages: a client package, containing the tools necessary to connect to a remote and manage keys (e.g. ssh and ssh-add, respectively), and a server package, containing the tools necessary to run an ssh host (e.g. sshd). It might seem like an annoyance initially, but it's actually a solid security feature by itself. Machines that do not need to host external ssh connections don't need the extra app, and, more importantly, they shouldn't have to secure the server defaults. Boxes that host an ssh server usually have both, but they're also tailored to secure the configuration (or at least should be).

The client package, usually a variant of openssh-client*, contains a few important commands. ssh-keygen creates new keys (it can do so much more).

The extra permissions (chmod) setup is managed automatically by ssh-keygen, so you typically don't have to worry about it unless you're moving files around and don't respect permissions (e.g. Windows touched the file at some point). If you get it wrong, there are good error messages, or just use this StackExchange answer.

Once keys are in place, ssh establishes the connection. By default, it assumes the server is listening on port 22 and the id_rsa identity should be used for everything (the man page for ssh explains how to not use defaults):

Once multiple identities get involved (e.g. user1@hosta and user2@hostb), things get messy. Luckily OpenSSH includes tools to handle that. ssh-agent (man here) can hold identities and send them automatically to ssh.

Ansible

Ansible scripts your world. As a coder, that's a very powerful statement. There are other tools, but I prefer Ansible. Its syntax is simple, its scope is massive, and it runs just about anywhere (that you'd want to develop, so not Windows).

Ansible is capable of running a plethora of ad-hoc commands via ansible, but its power comes from its ability to stitch those together in easily readable YML files. A selling feature of Ansible is that its modules are idempotent (probably the first time this year I was able to put my math degree to good use, i.e. "I accrued so much debt just to identify obscure jargon"). In theory, a good collection of commands, called a playbook, should be idempotent as well. In practice, this takes some careful polishing to achieve.

Ansible runs playbooks against an inventory of hosts. Each playbook contains of a list whose items include host selection, variable declaration, and tasks. Each task is compromised of, essentially, a single command (block support exists with a few caveats). To encourage DRY code, Ansible allows grouping tasks in roles. If that sounds like a lot of directories, it is. Ansible handles that for you with its configuration file, ansible.cfg (among other things like default arguments).

I'm still new enough to Ansible that I've been afraid to visit Ansible Galaxy, which introduces itself as

your hub for finding, reusing and sharing the best Ansible content

i.e. it's an Ansible package manager which is awesome because someone smarter probably thought of my idea first. I will eventually update with Galaxy roles; for now this is all local because I need to grok the whole process.

Ansible publishes best practices, which I've tried to follow throughout. Again, I'm new to Ansible, so I'm still working out how those best practices work in practice.

I'm going to touch on Ansible components as I bring them in, so I'll leave finding and running a basic intro playbook and inventory up to you.

Optional: Vagrant

If the idea of scripting your environment applies to you, you'll love Vagrant. It is a ruby-based VM manager with access to all the major hosts. It prides itself on being easy to use; after installation, you can run

$ vagrant init hashicorp/precise64
$ vagrant up
$ vagrant ssh

and you're inside a virtual Ubuntu box.

Because Vagrant is scriptable and can emulate anything with an image (basically anything you can find an iso of), it's a cheap and easy way to test host configurations. Unlike Docker, its default storage is persistent, as it creates full VMs for each box. As a primary goal of a containerized app is to be as slim and isolated as possible, installing the tooling for Ansible inside a container doesn't make much sense. I feel like Vagrant lends itself more to that role (Ansible pun intended) here, but if you really wanted to you could duplicate this via Dockerfiles. If your world is Docker containers inside Docker containers, you might want to do that.

To run the provided Vagrantfile, clone the repo and run vagrant inside the directory.

WARNING: If you're on Windows running vanilla Containers and Hyper-V, you'll probably need to add --provider=hyperv to all up commands. If you're not running vanilla, you're probably already aware that trying to run another provider while Hyper-V is running (which is always unless you changed the boot config) won't be pretty. I crashed my system three times one weekend, initially trying to figure out what was wrong and later because I forgot to force Hyper-V.