A hard reload would restore app functionality in all cases. It took me three days of dedicated work to trace the problem, write a fix, and deploy the fix to production. (╯°□°）╯︵ ┻━┻

Load Balancer/Traces

Our first thought was that maybe it was a load balancer issue? Did one instance issue a token which another instance considered invalid? That led me to dive into how Rails validates a CSRF token, by way of stepping through the source file. I copied the file to our config/initializers/ folder, inserted a bunch of logger statements, and walked the token through the file.

Nothing. Like, sure, now I know what happens when I insert before_action :verify_authenticity_token into a controller, but that didn’t fix my problem. Traces from the error weren’t helpful either, because the trace only goes as far back as line 195, which can be called from several places in the file.

Turbolinks

Oh boy. Turbolinks acts as the backbone of our application’s view layer, and CSRF issues are an old, known problem with the library. Comparison of the expected (issued with form_authenticity_token) dumped to server logs, with the token set in the browser, showed me the root cause of the problem: the CSRF tokens set in (i) the header meta tag, and (ii) AJAX request headers, were both incorrect. Neither matched the issued token.

So why? Subsequent POST/PUT/DELETE actions would work after I hand-set the new token, but why wasn’t it set correctly? What happens normally:

A new token is issued on each non-XHR GET request.

The token is set in a header meta tag.

The jquery-rails gem will read the new tag and update AJAX headers appropriately. The update fires (looks like) on each turbolinks:load event.

Step #2 fails and I still don’t know why. ¯\_(ツ)_/¯

Caching and Shite

Specifically, the error occurs only in our native app wrapper. As far as I can tell, either the app wrapper or the browser continues to cache parts of the older header during a HTTP 302 redirect.

Fixing the Problem/Single Source of Truth

With the problem (the old CSRF token persists) and cause (caching stuff?) established, I set out to fix the problem. Hours of trial and error and Google left me in favour of either a request response or a non-secure cookie to hold the token. Other programmers have resorted to the same solution.

While other developers favoured adding a new token to each response request through a custom, I didn’t like that each request authorizes the next, in essence. There’s no single source of truth in the page for the CSRF token. I hold that authorization to make actions should rest with the document/page/cookies instead. Fastly advocates the use of a secure cookie to hold the CSRF token. Good idea, except that the token doesn’t need to be secure because I have to expose the token to the client for it to make a successful request.

Fix: Application Controller

I added code to our application controller, such that when the site receives a valid non-XHR GET request, it’ll create a non-secure token which contains the new CSRF token.

Fix: Set CSRF Token

The final step is to set the new CSRF token after I extract it from the cookie. We use Backbone.js-style event listeners. I devolve control to each script: they push their own handlers to the global stack. One thing I noticed is that the CSRF token is percent-encoded, and so must be run thorugh decodeURIComponent before I can use it.

Note:decodeURIComponent has no fucks to give: It will cast undefined as 'undefined'. The code first extracts the cookie-token and tests for truthiness before any further handling.