tl;dr

Apache httpdProxyPass and ProxyPassReverse are our best friend to mount an external URL (and its descendants) onto a folder of our very own domain.

Why using a reverse proxy?

We are in 2016 and mentioning reverse proxy in a conversation sounds odd and pretty much dated from a different century. But hey, who cares?

I use reverse proxies for various reasons:

to isolate components
So instead of putting everything in a larger and larger monolithic website, we can manage them as different git repositories and have a different build process as well (like /photography and /talks on this website)

to manage different stacks
In the case of a conference with a yearly edition or so, we can iterate over the software stack and change accordingly to our needs without having to upgrade legacy editions nor to keep continuing them because we feel obliged to.

to provide a transparent experience to our users
We can host content in different places and still provide a coherent experience to a user without having them to feel the spread of our infrastructure.

to upgrade individual components
One by one rather than the entire stacks. Which makes the life easier in term of Q&A scope. Obviously, we fall in the microservices trap so the more individual projects we have, the more scattered our attention and efforts can be.

to run Docker containers or web applications
We can prevent to expose them directly on the port 80 or 443 – although this is not the point of this article as we are rather focused on proxying external content.

It is a good way to hide complex and purposeful components under a same and apparently unique domain. This is for example how websites like the BBCfeel like one website whereas they are in reality composed of dozens and dozens of different websites developed by independent teams.

How does it work?

An HTTP request directed to our hosting provider will usually look like the following examples:

By default we assume the folders /cheese and /doc are contained in the same directory as the root of the website.

Let's say we actually have decided to opt in for a whitelabeled content provider for a part of the website and moved another part of it to a static website hosted on GitHub Pages. The above example would evolve into:

Configuring Proxy directives

I found Apache ProxyPass documentation to be quite clear actually (or maybe I spent too much time reading it). We can manage to exclude folders from the proxying or match only specific patterns with ProxyPassMatch. I guess all we need is a use case before starting to use them 😊.

ProxyPreserveHost

This setting has an influence on how our VirtualHost proxy server will advertise the Host HTTP header to the client.

ProxyPassReverseCookieDomain

This is exactly the same principle as ProxyPassReverse but to rewrite the hostnames contained in any Cookie header emitted by the backend.

SSLProxyEngine

This one will enable the proxy module to deal with signed requests. We could definitely have an HTTP to HTTPS or, better, HTTPS to HTTP – to secure insecure parts of our website. Or to secure them… with a different SSL certificate.

And that's precisely one advantage to use a reverse proxy in front of GitHub Pages to use our custom domain and our own certificate.

Reverse Proxy over HTTPS

GitHub serves every GitHub Pages websites over HTTPS if they have been created after June 15th 2016. So we will have to make sure both our server can talk over SSL with GitHub by enabling mod_ssl.