annaken

Tuesday, 12 April 2016

I'm moving from Blogger to Github pages for various reasons, but mostly because writing the kind of part-text, part-code articles that I do becomes a sisyphean task after about a page and a half. I write everything in markdown and put it in git anyway, so Github pages seems almost too easy.

New blog is here: http://annaken.github.io/ and I'll set up a redirect from this site to that in a couple of weeks.

Wednesday, 17 February 2016

THE REFERER HEADER

The poor referer header. Misspelled and misused since its inception. Its typical use is thus: if I click on a link on a website, the referer header tells the landing page which source page I came from.

It's heavily used in marketing to analyse where visitors to a website came from, and also very useful for gathering data and statistics about reading habits and web traffic.However, it presents a potential security risk if too much information is passed on.In the referer header's original RFC (2616) [1], the specification lays out that:

"Clients SHOULD NOT include a Referer header field in a (non-secure) HTTP request if the referring page was transferred with a secure protocol"

That is, if our request goes from https to http, the referer header should not be present.However, RFCs are not mandatory, and data can be leaked. Facebook fell foul of this a little while ago, when it turned out that in some cases the userid of the originating page was being passed in the referer header to advertisers when a user clicked on an advert [2].Additionally, when traffic goes between two https site - as is increasingly common in the move towards ssl everywhere - the RFC does NOT require that the referer header is stripped.

THE META-REFERRER TAG

A potential solution to these two issues, and more, looks to be the meta-referrer tag. By adding the following tag to the source web page:

<meta name="referrer" content="origin">

the referer header can be edited to allow sites to see where their traffic has come from, but without leaking potentially sensitive data. The options for the content field are [3]:

no-referrer: omit the referer header from the request

no-referrer-when-downgrade: omit the referer header when moving from https to http

origin: set the referer header to be the origin only, that is, stripping the any path and parameters from the url

origin-when-cross-origin: if the request is to a different website or protocol, set the referer header to the origin

unsafe-url: set the referer header to be the full originating url regardless of target site or protocol, potentially leaking data.

To use a practical example, if facebook was to implement this tag as: <meta name="referrer" content="origin" id="meta_referrer" />

so when Mr Bobby Tables is logged into facebook, and on his homepage: https://www.facebook.com/bobbytables?f=nrefwhen he clicks on an external link and is taken to a different site, the referer header is reduced to referer=www.facebook.com

thus preserving his privacy. The target site registers that they've had a visitor from a facebook hit, but the name of the user is not passed on.

HANDLE WITH CAUTION

Whether the referer header is implemented with the new meta-referrer tag or not, it is prudent to approach it with a degree of caution.Referer spam is still an issue [5] - an attacker can target a website using a specific referer header, which is reported by analytics tools to the website owner. Out of curiosity about where their traffic is coming from, the owner will often follow the link back to a malicious web page. The referer header also opens up potential for exploits and XSS attacks [6][7]. It is trivially easy to manipulate headers, so relying on the header for authorisation or authentication is heavily discouraged.

MISSING HEADERS

The referer header is omitted if:

the user entered the url in address bar

the user visited the site from a bookmark

the request moved from https to http

the request moved from https to different https url

security software (antivirus, firewall etc) stripped the request

a proxy stripped the request

a browser plugin stripped the request

the site was visited programatically (eg using curl) without setting a header

the meta-referrer tag disallows it

the meta-referrer tag allows it but the browser does not have meta-referrer support [8]

For websites that would rely on the referer header for certain advertising campaigns, the patchy and inconsistent usage of the header can be a real problem. Proxy rules allowing access for users originating from specific sites both have a high risk of not working at all depending on the user's browser or local setup, and are also vulnerable to abuse if the headers are manipulated.

TLDR

To sum up, the referer header was rather flakey, and is now slightly less flakey. It's often omitted either accidentally or deliberately, and easily faked. It can be a very useful tool in gathering data about web traffic, but probably best not to rely on it for anything especially important at this point.

Tuesday, 8 December 2015

I gave a talk at the recent PuppetConf called "Puppet in the Pipeline" - a round up of workflow planning, deployment pipelines, and integration points. I start out with a very basic setup, and walk through various stages of complexity, talking though technical options and things to consider. I can't seem to get it written down as a satisfactory blog post, but for now I will just link to the video and slides:Video: https://www.youtube.com/watch?v=4jXGmxkEoeM

Friday, 31 July 2015

At the end of Part 1 we had a Serverspec installation running tests which were stored alongside our configs.
Command-line arguments passed in the name of the VM and a list of modules to be tested.

Next, we want to look carefully at the output generated by Serverspec so that we can track and visualise our tests. We need to track our data carefully so that we can cope with the results of many different VMs.

Serverspec outputs

Serverspec has a number of output options. The 'documentation' style is what we've seen printed to screen so far; there are also json and html reports. It is possible to get all of these formatting options at once by adding the following line to your Rakefile:

Each test is an element in the 'examples' array, and at the end we have a summary and a summary_line.

We're going to pick up every test as a separate json object, insert some identifying metadata, and output each test as a line in /var/log/serverspec.log

Apart from the host and module identifiers, it might also be helpful if we knew, for example, that the OS version of the host was, which git branch it came from, and maybe a UUID unique to a test (which could encompass multiple VMs).

With this in mind, we re-write our /opt/serverspec/Rakefile as follows:

{"description":"should be installed","full_description":"Package \"ntp\" should be installed","status":"passed","file_path":"/opt/puppetcode/modules/ntp/serverspec/init_spec.rb","line_number":4,"run_time":0.029166597,"module":"ntp","time":"2015-07-31-12:21","uuid":"12345","host":"www.example.com","branch":"dev","osrel":"7.1"}

This log can now be collected by Logstash, indexed by Elasticsearch, and visualised with Kibana.

Whether you're spawning VMs to cope with spikes in traffic, or you want to verify your app works on a range of operating systems, it's incredibly useful to have some automated testing to go with your automated VM creation and configuration.This is a quick run-down of one way to implement such automated testing with Serverspec and get results back that are ultimately visualisable in Kibana. NB the orchestration of the following steps is beyond the scope of this article - maybe some CI tool like Jenkins, orchestration tool like vRO or some custom software.Overview:

The first two points are essentially prerequisites to this article: create some VMs and install them with whatever cloud and config magic you like. For the purposes of this article, it doesn't really matter. I'm just going to assume that your VMs are 'normal', ie running and contactable.

Functional testing with Serverspec

Serverspec, if you've not used it, is an rspec-based tool to perform functional testing. It's ruby-based, has quite an easy set-up, and doesn't require anything to be installed on the target servers, just that it is able to ssh into the target machines with an ssh key.

This will have created you a basic directory structure with some files to get you started.

Right now we have:

# ls /opt/serverspec
Rakefile
spec/
spec_helper.rb
www.example.com/

The default setup of Serverspec is that you define a set of tests for each and every server and then run the contents of each directory against the matching host. However this doesn't really fit the workflow we're setting up here.

Re-organise Serverspec from host-based to app-based layout

To get started, let's delete the www.example.com directory - we don't want to define a set of tests per host like this, we want to make an app-based layout.In my opinion, one of the easiest ways to organise the layout for your functional tests is to store it alongside your config management code. With this in mind, let's write a simple ntp test.

Yes, we did just hard-code the host name and modulelist to test. Don't worry, we'll switch these out in a bit.

Note that we provide a pattern path with a regex to the directory containing our tests. Essentially when we run this file, we will pick up every test that matches the pattern and run these tests against the desired host.

Run the test

Now, making sure we are standing in the /opt/serverspec/ directory, we can run

# rake spec
Package 'ntp'
should be installed

Green means that the test ran, and the output was successful. So as it stands, we can test our one www.example.com host with our one ntp test. Great!

Rewrite the Rakefile to take command-line options rather than hard-coding variables

Right now, our host identifier and our list of modules to test are hard-coded in the Rakefile. Let's rewrite so these are passed in on the command line.

The modulelist file can be one you write yourself, or generated from something like a server's /var/lib/puppet/classes.txt. It's a way to narrow down what tests are run against each server, as all modules are not necessarily implemented everywhere.Part 2: Generate logs that can be collected by Logstash, indexed by Elasticsearch, and visualised in Kibana

The final step is to re-generate certificates for all the rest of your nodes.
Option 1: log into every server and repeat the above.
Option 2: automate option 2 - think ssh, clusterssh, etc

Good luck!

PS I lied - the final, final step is to set up proper backup and restore of your certificate store at /var/lib/puppet/ssl and delete the clean --all line from your command history so you can't accidentally run it again.