architectures for tomorrow

I’ve previously noted that we’re using Kerberos to handle the authentication on our Hadoop clusters. One of the features that we had previously not had because of configuration issues, was the ability to use WebHDFS to browse around the cluster. With our latest cluster, we figured out the right incantation of Kerberos and SPNEGO configurations to make this work, validating it from Google Chrome and curl. The Hadoop-side setup is reasonably well documented, so I won’t go into it. There’s a few ways to get the browser-side working.

Validating SPNEGO is working on WebHDFS

The easiest way to determine if SPNEGO is working on your cluster, is to hit the WebHDFS path with curl.

Next, you will invoke curl with the negotiate option and the user set to anyUser. This is a fake user required by curl to initialize the authentication code. The real user is determined as part of the Kerberos authentication process.

The important thing to note is the Authorization header is set to Negotiate with a random string that would be following it (redacted above). This random string is the authentication token generated from the Kerberos data. Now that we’re specifying the token data, the Jetty responds back with WWW-Authenticate header containing it’s own Negotiate token.

Additionally, Jetty sets the hadoop.auth cookie to make it easier to authenticate in the future. This allows the web browser to pass a pre-authenticated token back and forth without incurring additional delay for the Kerberos authentication to occur.

A side-trip into your ticket cache

One thing you may notice after your first SPNEGO authentication occurs is an additional HTTP entry in your Kerberos ticket cache. This is related to the negotiation process.

So, now that we have verified our SPNEGO configuration is working, let’s move on to enabling Chrome.

All Chrome, No SPNEGO

When I originally set this up, I followed the pretty simple procedure for configuring Chrome support for SPNEGO. Under Linux, all you need to do is enable a few startup flags to create the whitelist of domain names where we’re willing to send our Negotiate credentials to.

But when I attempted this today, I found that no matter what I did, the WebHDFS access would fail:

This previously worked a few weeks ago. Digging around, I realized that Chrome had been updated on my workstation to version 43 today. I ran across Chromium fails to Negotiate [with SPNEGO] where they note that as of Chrome/Chromium 41, the Negotiation options aren’t getting correctly enabled if passed via the command line. Well, great. Now what do I do? I can’t tell people to downgrade to an older version of Chrome because that introduces security risks into their personal environments.

Enabling SPNEGO Policy Whitelisting in Chrome

So how do we do that?

Because every person would have had to enable the command line options, we will have to manage the policy on each machine where this option needs to be set. The first step is to create the directory where the policy file will be read at Chrome startup.

$ sudo mkdir -p /etc/opt/chrome/policies/managed

And then you create the policy file. This file is a JSON data structure that looks like this

Place the JSON in /etc/opt/chrome/policies/managed/spnego.json. The name of the policy file appears to be un-important.

Simply restart Chrome without the whitelisting command line options. When you view the WebHDFS URL, it should now look like this if you’re correctly authenticated to your Kerberos domain. The contents of your HDFS directories may differ.