I watched a discussion about X-Cache and X-Cache-Lookup headers unfold recently and it turns out a lot of people who I would have thought knew what these headers were indicating were a little muddled up. Further more, it turned out if you go looking for a good explanation, everyone seems to just link to this rather old blog post – despite being well meaning, it’s unfortunately slightly confused too.

So maybe I can give a better explanation.

First question – where is the spec for X-Cache? Well, there isn’t one – that little X- prefix indicates the header is not part of the spec, so its meaning may vary between proxy implementations (in fact with Varnish it’s common to add it yourself in the config file, so it could mean anything at all).

Why am I seeing these headers?

So with that out of the way I’ll focus on where I think you’re most likely to see them, Squid. Squid is a fairly common caching proxy, both as a forward (outbound) caching proxy and as a reverse (inbound/application accelerator/aggregating) proxy. Squid’s doco also fails to clearly define what the headers do, so if you’ve got here because you’re trying to figure out what they mean, there’s a good chance you’ve got a Squid proxy in the mix (even if you didn’t realise it).

What they mean

So, with Squid what these headers mean is:

X-Cache: Did this proxy serve the result from cache (HIT for yes, MISS for no).

X-Cache-Lookup: Did the proxy have a cacheable response to the request (HIT for yes, MISS for no).

This means if:

both are HITs, you made a cacheable request and the proxy had a cacheable response that matched and it handed it back to you.

X-Cache is a MISS and X-Cache-Lookup is a HIT, you made a request that had a cacheable response but you’ve forced a cache bypass – usually achieved by a hard refresh, commonly activated with Ctrl+F5 or by sending the headers Pragma: no-cache (HTTP/1.0) or Cache-Control: no-cache (HTTP/1.1).

both are MISSes, you made a request (it doesn’t matter if it was cacheable) and there was no corresponding response object in the proxy cache.

This is completely irrelevant to browser cache – if you’re viewing browser cache and inspecting the headers in Chrome or Firebug then you’ll see what the status of the proxy was at the time the proxy returned it to your browser. Sorry if this is obvious, but a surprising number of people seemed to think that the browser cares and bothers to modify these headers, it doesn’t. Really, it doesn’t. I promise.

How you can test this for yourself

First, if you’re trying to use a browser to inspect headers, understand what it’s really saying, e.g in Google Chrome, look for the (from cache) next to the Status Code: section in the headers:

This means nothing has gone over the network and it brought the object out of browser cache.

All other browsers are left as an exercise for the reader as frankly it’s not the right tool for the job (I just figured I had to cover one, as everyone seems to do it this way anyway).

-sv: this enables both --silent and --verbose, which (counterintuitive I know) is the simplest way to get curl in to a mode where it dumps both the body and headers out in a useful format (yes I know you can use --head, but that changes the request format and invalidates enough testing to make it useless).

http://example.com/: our URL of choice.

2>&1: Take STDERR (headers) and redirect it to STDOUT (so we can pipe it to egrep).

> /dev/null: Ignore the original STDOUT (body).

| egrep: pipe the headers through egrep.

'< (X-Cache|Age): grab just headers starting with X-Cache or Age.

You may note I snuck that Age header you’ve been ignoring in there, you can thank me later.

So in this first example we’ve made a cacheable request (absent of Pragma: no-cache or similar headers) and the proxy has had a response from its upstream 1 second ago (Age: 1) that was cacheable.

Now that we have a basic command, lets fiddle with it to see what happens.

If we request the same thing, we should see the Age header count up by roughly the number of seconds since the start of either request:

If we request the same thing but with a hard refresh header (note: Squid and probably every other caching proxy ever can be configured to ignore these headers on the request side, so if you get different results here, 9 times out of 10 that’s why), we can see that the proxy had the object in cache, but didn’t use it:

Sometimes I feel like I should have an entire section on my blog dedicated solely to PHP’s maniacle insanity just so I’d have a place to record whatever crazy new thing I learn about PHP on any day I work with it. Any day I work with it. Ever.

That said, this one particulary annoyed me.

PHP caches stat information. Why? I dunno really, VFS caches stat information too, you’d think it’d do a better job being that people other than PHP core developers designed it (so maybe some of them weren’t drunk at the time), plus it’s actually the right layer, it can see not only file system modifications made by the PHP you’re currently writing, but also all that other PHP you’ve written and the stuff Mr Jones wrote, also that strange bit of code that’s not PHP. So it would actually be able to cache the right things at VFS, instead PHP core devs actually thought it made sense to cache it in a PHP process, but what the hell lets leave OS problems to application developers and just move on.

So back on topic, PHP caches stat information or to put it another way, PHP has a stat cache (do you see what I did there?). Stat information are all those boring statistics you get about files when you write stat <file>, but really it’s so much more. It’s most of the data about a file that isn’t actually the file itself.

Great, so PHP caches stat information. That must make things faster or something? Possibly, but it also just makes it a massive hot squishy steaming pile of lies. Because any interaction with a file’s stat info is cached, if a file is modified in some way outside of PHP that you care about you now have to constantly call clearstatcache() so that when you check anything about the file you can be confident that PHP isn’t lying to you.

Did I mention PHP has a stat cache? A-packed-full-of-lies-but-at-least-it’s-faster-than-telling-you-the-truth-except-now-you-have-to-slow-things-down-by-constantly-clearing-it-stat-cache.

Somewhere about here you learn to deal with it, you think to yourself “it’s ok, I’ll just call clearstatcache() everywhere, it’s fine, it’s just a stupid cache that I can’t turn off, so I may as well deal with it, maybe I’ll write a Vim macro to insert clearstatcache() calls in front of any file system operations, that’ll fix it, it’ll be almost like PHP is a real language”.

Then, because you haven’t written that Vim macro (it’s a stupid idea any way) and you haven’t managed to turn off the broken stat cache (because you can’t) you run in to what must surely be a bug – even PHP core devs couldn’t consider this a feature, surely? You change ownership of a file using the built in chown() function, it’s a glorious thing, at first it’s owned by one user, then it’s owned by another user, cheers go up, standing ovations, it’s possible something built in to PHP actually works as designed. Cool. You go to check this using a PHP function (because like all PHP devs you totally do TDD and as such need to check such things) and because you updated the ownership information using a PHP function obviously the cache that PHP manages should be marked dirty and you’ll get back an accurate result right?

What A Pain!

What if you want to check the arguments going in to the last call? Well you can use at():

1

$mock->expects($this->at(7)) // ...

… better hope we never add any other calls!

What if we don’t know the exact parameter that it’s being called with and want to check it with something more complex? Well if you dig really hard in the manual you’ll find there’s a whole bunch of assertions that let you feed in crazier stuff like:

So that’s pretty cool, if you happen to like really obscure features that are impossible to remember.

Surely there’s a better way? Think of the children!

What if you could just ask for all the invocations and test that they were right in that language you’re already using for all your production logic? Wouldn’t that be just dandy!

Turns out you can, but it’s hiding – and I don’t mean it’s hiding in a “you will find this if you read the manual” kind of way, I mean it’s hiding in the source code, where everyone totally looks first for easy examples right?

All you have to do is store the result of $this->any() and you can use it as a spy:

12

$exec->expects($spy = $this->any()) ->method('foo');

(I’ve got to wonder if documenting those extra 7 characters might be the colloquial straw that breaks the PHPUnit manual’s back.)

Now that you have a spy, you can just do normal stuff that calls it, then use normal PHP logic (I had to laugh when I wrote “normal PHP logic”) to confirm it’s right:

123

// get the last invocation$invocation = end($spy->getInvocations());$this->assertEquals('foo', $invocation->arguments[0]);

An Example You Say?

As a concrete example, lets ensure the NSA is spying on its citizens just the right amount.

1234567891011121314151617181920212223242526272829303132333435363738

<?php// What we're testing todayclassAverageCitizen{publicfunctionspyOn(){}}// Our tests (yes, normally these would be in some other file)classTestAverageCitizensextendsPHPUnit_Framework_TestCase{publicfunctiontestSpyingLikeTheNSAShould(){$citizen=$this->getMock('AverageCitizen');$citizen->expects($spy=$this->any())->method('spyOn');$citizen->spyOn("foo");$invocations=$spy->getInvocations();$this->assertEquals(1,count($invocations));// we can easily check specific arguments too$last=end($invocations);$this->assertEquals("foo",$last->parameters[0]);}publicfunctiontestSpyingLikeTheNSADoes(){$citizen=$this->getMock('AverageCitizen');$citizen->expects($spy=$this->any())->method('spyOn');$citizen->spyOn("foo");$citizen->spyOn("bar");$invocations=$spy->getInvocations();$this->assertEquals(1,count($invocations));}}?>

and when we run the tests we can see that even PHPUnit knows the NSA has crossed the line:

If you want the % character in a command, as part of a cronjob:
1. You escape the %, so it becomes \%
2. echo and pipe the command you want to run (with the escaped %) into sed
3. Have sed unescape the %
4. Pipe it into the original program

Which I responded to with roughly “if you want a ‘%’ in your cron line you actually want a shell script instead”… this turns out to be a great argumentdebate as a lot of very good Sys Admins (who were online at the time) completely disagreed with me until I’d spelled out my argument in more detail.

Problems with cron

The syntax can be quite insane if you’re expecting it to behave like shell (hint: cron != shell)

There’s no widely used crontab linter (I was going to leave it at “there’s no crontab linter”, but found chkcrontab while writing this, which looks like a good start but isn’t packaged for any distro I’ve checked yet)

Badly breaking syntax in a crontab file will cause all jobs in that file to stop running (usually with no error recorded anywhere)

Unless you’re double entering your scheduling information you’re not going to be able to pick up the absense of the job in your monitoring solution when it fails to run

I’m stupid (yes this is a problem with cron)

All of these have led me to break cron lots of times and even more times I’ve had to try to figure out why a scheduled job isn’t running after someone else has broken it for me. Happy days.

KISS

Whenever I’m breaking something fairly critical too often for comfort, it’s time to Keep It Simple Stupid and the way I’ve tried to do that with cron is to never ever put anything complicated on a cron line.

Lets take a simple example:

1

* * * * * echo % some % percents % for % you %

Intuitively I’d just expect that to do what it does on the shell (echo’s back the string, which in cron would normally make it back to someone in email form), but instead the first % will start STDIN for the command, the remaining %s will get changed to new lines and you’ll end up with an echo statement that just echo’s a single new line out to cron as it’s not interested in the STDIN fed to it.

This creates a testing problem because now to test the behaviour of the cron line I need to wait for cron to run the cron line (there’s no way to immediately confirm the validity of the line).

If we instead place the behaviour we want in a script:

12

#!/bin/bash -e
echo % some % percents % for % you %

and call that from cron:

1

* * * * * /path/to/script

You can be reasonably confident that it’ll do exactly the same thing when cron runs it as when you test it on the terminal.

But % is ok when it’s simple

Some people tried to make the argument that a % is really ok when it’s actually really simple, e.g.:

1

* * * * * date +\%Y\%m\%d_\%H\%M\%S > /tmp/test

happens to work the same if you copy it to a terminal because the %s are escaped in the cron line and the escaping will happen to drop off in shell as well, but what if you want a quoted % – you’re stuffed.

Back to KISS again.

Other reasons to keep cron simple

While this is mostly an argument for backups, if you keep your cron files simple it may not matter as much when they get nuked accidentally as now you’ve only lost scheduling information and not critical syntax :)

Summary

If I’m not 100% certain I can copy a line out of cron and run it on the terminal I think it doesn’t belong in cron.

XML support in PHP is actually pretty good these days, but as with anything in PHP (why is that?) it has a few little quirks and corner cases that provide for continual facepalm moments.

Rather than just sit around and complain or try to get stuff in to the core (where there’s no way I’d be able to use it in real world projects until RHEL catches up, i.e. 3-4 years from now) I thought I’d see what I could do purely in PHP.

Git has a feature called stash that lets drop whatever you’re working on and put the working directory back to a clean state without having to commit or lose whatever deltas you had.

This is a great idea, but it’s sorely missing one core feature for anyone who works on more than one machine – the ability synchronise the stashes between machines, so if you’re like me (I work on the same code on up to about 4 individual machines in a week) you probably want some way to move stashes around.

So I’ve started git-rstash, as usual it’s written in terrible bash in the hope that someone will take enough offence at it to take the whole problem off my hands, in the mean time maybe you’ll find it useful too.

For the moment synchronising them is purely up to the user, but they are conveniently placed where the user can drop them in whatever cloud-syncy-like thing they’re already using (Unison, Ubuntu One, Dropbox, etc).