Adventures in Enterprise Computing

DevOps

A few days ago, I noticed something very troubling after using the Maven release plugin to publish a release artifact to an internal Maven repository: my git password was exposed in the Maven build output as well as a git.properties file that the Maven Git commit ID plugin generated. These files are is now sitting in Artifactory for all to read.

Not cool. For Maven-based projects, I typically use the the Maven Release Plugin. Because we’d also like to track some of the git metadata about how the build was produced, we also use the Maven Git commit ID plugin as well, which plays quite nicely SpringBoot. So I was very disturbed to see my password all over the place.

What is happening here?

First of all, I should be clear that the project in question is using the Maven command line wrapper and pulling down Maven 3.3.9, which is the latest at the time of this writing. I’m also using Git 2.7.4 and the current release of the Maven Git commit ID plugin, which is was 2.2.0. For the most part, everything is current.

This issue here is not specific plugin any single plugin (but it looks like the Maven Release Plugin is the core offender), but rather the issues only manifest themselves in certain conditions when the group of plugins interact with one another during the release process. The Maven plugins in question are:

The combination of these plugins will expose your Git passwords when using Git over either HTTP or HTTPS when the Maven Release plugins release:prepare and release:perform plugins are invoked, but curiously not when the package,install, or deploy goals are invoked. Additionally if you’re using the Maven Git Commit ID Plugin to capture commit information in your build, the generated git.properties will contain your user name and password when using the default setting and this file will be visible in the Maven repository your artifact is published to. What appears to be happening is that the Maven release plugin in rewriting the git origin URI and including the credentials in the URI. Thus, when the git commit ID plugin goes to resolve the git.remote.origin.url value, it now includes the username and password as well.

Demonstrating the issue

Reproducing this issue was kind of a pain in the ass, but with tools like Docker and Docker Compose, it’s a little less painless. I have created such an environment which does the following:

Check the README.md for details on how to run it, but it works fine under Docker Machine on OS X and Linux. This project will startup both Artifactory and Gitbucket and startup a “workspace” container and dump you into the container where you’ll be able to execute the Maven and git commands. The test is 100% repeatable whenever you perform a release.

How you can prevent this?

There are a few ways you can prevent your passwords from being exposed:

Use SSH instead of HTTP/HTTPS in your CI setups

SSH is doesn’t have this issue and won’t expose your passwords. SSH avoids this problem all together, unless of course you use SSH with usernames and passwords. Granted, SSH may not be appropriate in all cases. If you’re in an enterprisey environment, this may be more complicated. SSH is also kind of a pain in the ass on Windows. If SSH isn’t an option, there’s a few more options.

This will keep your passwords out of yours logs in Jenkins, TravisCI, or other CI environment, but it doesn’t address the fact that the Git commit ID plugin, and probably others, will still render the username and password in the URI.

Exclude the git.remote.origin.url property from your build

If you’re using the Git Commit ID Plugin, exclude the git.remote.origin.url property from your build:

This will completely remove the origin URI from the properties file. This can be annoying if you want to track where the code came from, which handy if you have more than one Git repo manager hosting the same code (i.e. you have a mirror repo that is local to one of your global offices). I have submitted PR #241 which attempts to strip out the password if it’s found in the URI.

Use Git Commit ID Plugin 2.2.1 or higher

Update: as of 3/26/2016, version 2.2.1 of the Git Commit ID Plugin 2.2.1 plugin was released which includes PR #241. And that PR fixes the issue altogether.

Docker and Jenkins are like the chocolate and peanut butter of the DevOps world. The combination of the two present a ton of new opportunities and headaches. I’m going to talk about both.

For this post, I’m assuming you are already familiar with setting up Jenkins and comfortable with Docker. Rather than rehash a lot of existing posts on Jenkins and Docker, I would suggest heading on over to the Riot Games Engineering blog where they have a ton of excellent articles on integrating Docker and Jenkins. I’m going to focus on my specific set up, but I’ve borrowed a lot of ideas from them.

Target setup

I say “target” because all of the pieces to don’t yet do what I’d like them to do. It’s simple really: set up a Jenkins master in a container on one host with multiple JNLP agent containers across multiple hosts. The agent hosts could run in different AWS VPCs and/or accounts using ECS.

My goal here was to have a generic agent configuration that could be deployed onto any host. Each project would then be responsible for for defining its own build environment and that is expressed through a container. This would put the build environment configuration in hands of the development team rather than the team that is managing the Jenkins infrastructure. I REALLY wanted to avoid having agents with a specific set of build tools. Containerized build environments can do this, it’s just getting everything to play nice that is the real challenge.

To get me there, I’m also leveraging the Jenkins Pipeline/Workflow plugin. This set of plugins gives you a very elegant DSL for describing build pipelines. Even better, it has pretty slick support for using containerized build environments via the Cloudbees Docker Pipeline plugin. It’s pretty simple to do something like so:

This pipeline will execute the build on a Jenkins agent named “test-agent” and will attempt run the build inside a container based on the “maven”3.3.3-jdk-8” image. This particular pipeline runs fine when the agent runs directly on the host, but it fails when the Jenkins agent runs in a container.

This will bring up Jenkins and it will be able to call the docker command and do everything a “Docker-in-Docker” set can do. There’s no need for privileged mode or the wrapdocker script.

One caveat here: you’re not going to be able to simply reuse the official Jenkins image to do this because the jenkins user needs to be a part of the docker and/or users group in order to be able to make use of the socket. Once you do that, Jenkins can happily call docker from within the container, and you can build and run other containers with ease.

The Jenkins JNLP agent container

The Jenkins agent container follows similar rules as the master. It too needs access to the docker socket and executable and you can do something like this:

Like the Jenkins master, you have to ensure that the jenkins user is in a group that has the privileges to access the docker socket. I’m using a fork of the Jenkins JNLP slave container and adding the necessary groups. Once you do this, your agent will come up and you’ll be able to execute builds against the agent. Almost.

The exact moment where the wheels came off

The moment you start to execute a build that runs with in a container, things go off the rails pretty quickly. The problem is that you have the agent container binding to a host directory ${JENKINS_HOME}:/var/jenkins_home and then the build container needs access to the same directory. The Cloudbees Docker Pipeline plugin will execute the following when using the docker.inside() function:

The container is trying to mount the the host directory /var/jenkins_home/workspace/uri-templates-in-docker into this containerized build environment for Maven 3.3.3 and tries to set that directory as the current working directory. This all works great if the Jenkins agent is running directly on the host, outside of a container. When running inside a container, I’m basically trying to do this:

And this absolutely does not work. Because I’m mapping the docker socket from the host to the Jenkins agent container, any volumes that are mounted in this “faux docker-in-docker” manner are actually referenced from the host, not from perspective of the Jenkins agent container. So assuming the directory of the ${JENKINS_HOME} on the host was something like /opt/jenkins_home, something like this “should” work:

Since we’re kind of running “docker-in-docker”, getting the path of host directory is tricky.

It’s not exactly portable since the containers need to have more intimate knowledge of the hosts directory structure.

There is a better way.

The beauty of Docker data volume containers

It’s taken me about 18 months to finally understand why one would want to use a container for storing data. Now I get it. For this use case, a docker volume container is an incredibly elegant way of sharing a volume between multiple containers. It provides a clean abstraction around the volume and provides a host-independent way of referencing the volume. With data volume containers, you end up with something like this:

Again, borrowing some ideas from Maxfield at Riot Games, I created a data volume container pretty much the same way he describes. Now while Docker 1.9+ have the ability to create named volumes, theres a few major issues with using them right now:

The documentation is seriously lacking. And when I say lacking, I mean it doesn’t exist. See issue #20465

Volumes created with docker volume create will always be owned by root. This is being fixed for Docker 1.11, but it doesn’t help much when you’re using docker 1.9 and 1.10. Since Jenkins runs as jenkins, this doesn’t work.

Since my target environment is Amazon ECS which is using Docker 1.9, I’ll continue with data volume containers. I used Maxfield’s Dockerfile verbatim and created the container like so:

So far so good. The bad news is that the Docker Pipeline plugin still insists on mounting a volume from the host, which in my case doesn’t actually exist in the Jenkins agent container. So for now, the Cloudbees Docker Pipeline plugin is a non-starter.

However, it is possible to bypass the Docker Pipeline plugin and change the pipeline script to be as follows:

And this mostly works. The project will build but fails on the tests because the pipeline git task doesn’t handle submodules very well. However, this is an issue with the specific project and we at least have the build failing 2/3’s the way through the build in the target containerized build enviornment.

Wrapping Up

Containerized build environments are such a great idea and will save a lot of hassle down the line. I’m also loving the new Jenkins pipeline plugins, even though there’s a few rough edges. I’ve posted a the code for a working environment that illustrates this set up here:

Remember, not everything here works as desired, but it at demos what could be possible. I hope this post helped folks better understand how to execute docker builds in a Jenkins container and get better grasp of how docker manages data volumes. There’s still a lot more to learn and few PRs to create 😉

There’s an interesting post making the rounds in Twitter called How a bug in Visual Studio 2015 exposed my source code on GitHub and cost me $6,500 in a few hours. The short version is that the developer from Humankode attempted to create private repo on Github via the Visual Studio Git extension. Unfortunately, for this developer, the Visual Studio Git extension made the repository public rather than private. Complicating matters, the developer had committed his AWS access key and AWS secret access key. Thus, the keys were compromised and got into the hands of the wrong party and ran up $6,500 in AWS charges. In the end, the developer had this to say as a lesson learnt:

At face value one might say it’s simple : don’t publish your access keys to a public repository, which is what many before me have done. In my instance, I specifically published to a private repository, but a bug in visual studio meant that the code was published to a public repository. As soon as it was out in the wild, it was too late. Bots scan GitHub repositories and it only takes 2 or 3 minutes for some of them to pick this up.

It’s reasonable advice, but you should do much more.

Don’t ever publish credentials to an SCM, public or private

By credentials, I mean anything that allows someone to gain elevated privileges to your systems, which include:

Passwords

SSH Keys

Private keys or certificates

OAuth Consumer or Token Secrets

AWS Access and Secret Keys (also the equivalent of Azure or any other public cloud)

While stating “don’t publish your access keys to a public repository,” is sound advice, I’d expand on that and assert that credentials shouldn’t be published in any repository. Period. Also, don’t put these things into your Wiki, or on a shared file system. Avoid exchanging credentials via email too. They’re credentials. They give anyone the power to do things on your system. You should control who has access to manage your runtime environment, this means locking down the credentials and entitling credentials. If credentials reside in source control, you’re asserting that every developer has full admin rights to your deployment environments as well. That’s probably not what you want.

But why is storing them in an SCM a bad idea? Chances are, you put the credentials in your SCM as a means to simplify deployment. As a result, they’re likely being included in your builds too. If you’re producing a package that gets published to a package repository (NuGet, Maven, RPM, etc.), the credentials are going there too. Is your package repository private too? If you’re using a CI tool like Jenkins, those credentials are also being copied there too. Is that locked down and private too? Can you prevent others from seeing your working directory? Another big gotcha, especially with tools like Git, is the possibility that a developer on your project “backs up” a project to another remote repository. There’s nothing stopping anyone from cloning your private repo on GitHub to a public repo on BitBucket by simply adding a new remote. Your best defense is to not put credentials where your source code lives.

Where should you put credentials?

If not in an SCM, then where? You probably keep your personal credentials private. You may even use a password utility like LastPassword, 1Password, etc. because you can’t remeber all of these passwords in your head and need to retrieve them at some point. Basically, you’re keeping these secrets private, as they should be, as opposed to slapping post-it notes to your desk. Application credentials should be treated no differently. There are tools like Hashi Corp’s Vault, Conjur, or Cyberark to name a few, which are all designed to manage application credentials so that they are protected.

Use a Token Service

If you are using AWS, consider using Amazon’s Secure Token Service. This allows you to use temporary credentials that expire between 15 minutes to 24 hours. You can authenticate users with ADFS or OpenID Connect, and only after they have been authenticated by your systems, will they be able to obtain an STS token. Of course, this all falls apart if you are storing your ADFS or OIDC credentials in your Git repository. Since that’s a crazy idea, you’re not doing that. Right?

Get in the Habit of Rotating your Credentials

It’s usually a good idea to change passwords periodically. And by periodically I mean every few days to weeks, not months. Access keys are no different and you should rotate them on a regular basis. The AWS blog has an informative post on how to rotate access keys for IAM users. If you look closely, it’ll make more sense why IAM users are allowed to have a maximum of 2 access keys. Something like this is pretty easy to automate as well. Keeping an IAM access key valid for more than 2–12 months is not a good idea.

Entitle your Credentials

Proper use of entitlement, roles, or privileges can help minimize the impact that an attacker can have if your credentials are compromised. If an application only needs to have read/write access to DynamoDB, then it should only have read/write access to DynamoDB. It shouldn’t have the ability to spin up new EC2 instances, call CloudFormation, etc.. It’s easy to select “PowerUser” from the IAM console but you shouldn’t. Yeah, an attacker might be able to read your data, but they’re not going to have the ability to spin up 1,000 EC2 instances to mine bitcoins.

Just keep your credentials private, even in private.

Credentials need to be secured, plain and simple. Even if your running you app in-house, with a private wiki and private git repo, you still should not put credentials of any sort into those systems. It’s one of the first places attackers look if your environment is compromised.