In the middle of last month (July 2018), I found myself staring at a projector screen, waiting once again to see if Wikipedia would load. If I was lucky, the page started rendering 15-20 seconds after I sent the request. If not, it could be closer to 60 seconds, assuming the browser didn’t just time out on the connection. I saw a lot of “the server stopped responding” over the course of a few days.

It wasn’t just Wikipedia, either. CNN International had similar load times. So did Google’s main search page. Even this here site, with minimal assets to load, took a minimum of 10 seconds to start rendering. Usually longer.

In 2018? Yes. In rural Uganda, where I was improvising an introduction to web development for a class of vocational students, that’s the reality. They can have a computer lab full of Dell desktops running Windows or rows of Raspberry Pis running Ubuntu or whatever setup there is, but when satellites in geosynchronous earth orbit are your only source of internet, you wait. And wait. And wait.

I want to explain why—and far more importantly, how we’ve made that experience interminably worse and more expensive in the name of our comfort and security.

First, please consider the enormously constrained nature of satellite internet access. If you’re already familiar with this world, skip ahead a few paragraphs; but if not, permit me a brief description of the challenges.

For geosynchronous-satellite internet access, the speed of light become a factor in ping times: just having the signals propagate through a mixture of vacuum and atmosphere chews up approximately half a second of travel time over roughly 89,000 miles (~152,000km). If that all that distance were vacuum, your absolute floor for ping latency is about 506 milliseconds.

That’s just the time for the signals to make two round trips to geosynchronous orbit and back. In reality, there are the times to route the packets on either end, and the re-transmission time at the satellite itself.

But that’s not the real connection killer in most cases: packet loss is. After all, these packets are going to orbit and back. Lots of things along those long and lonely signal paths can cause the packets to get dropped. 50% packet loss is not uncommon; 80% is not unexpected.

So, you’re losing half your packets (or more), and the packets that aren’t lost have latency times around two-thirds of a second (or more). Each.

That’s reason enough to set up a local caching server. Another, even more pressing reason is that pretty much all commercial satellite connections come with data caps. Where I was, their cap was 50GB/month. Beyond that, they could either pay overages, or just not have data until the next month. So if you can locally cache URLs so that they only count against your data usage the first time they’re loaded, you do that. And someone had, for the school where I was teaching.

But there I stood anyway, hoping my requests to load simple web pages would bear fruit, and I could continue teaching basic web principles to a group of vocational students. Because Wikipedia wouldn’t cache. Google wouldn’t cache. Meyerweb wouldn’t cache. Almost nothing would cache.

Why?

HTTPS.

A local caching server, meant to speed up commonly-requested sites and reduce bandwidth usage, is a “man in the middle”. HTTPS, which by design prevents man-in-the-middle attacks, utterly breaks local caching servers. So I kept waiting and waiting for remote resources, eating into that month’s data cap with every request.

The drive to force every site on the web to HTTPS has pushed the web further away from the next billion users—not to mention a whole lot of the previous half-billion. I saw a piece that claimed, “Investing in HTTPS makes it faster, cheaper, and easier for everyone.” If you define “everyone” as people with gigabit fiber access, sure. Maybe it’s even true for most of those whose last mile is copper. But for people beyond the reach of glass and wire, every word of that claim was wrong.

If this is a surprise to you, you’re by no means alone. I hadn’t heard anything about it, so I asked a number of colleagues if they knew about the problem. Not only had they not, they all reacted the same way I did: this must not be an actual problem, or we’d have heard about it! But no.

Can we do anything? For users of up-to-date browsers, yes: service workers create a “good” man in the middle that sidesteps the HTTPS problem, so far as I understand. So if you’re serving content over HTTPS, creating a service worker should be one of your top priorities right now, even if it’s just to do straightforward local caching and nothing fancier. I haven’t gotten one up for meyerweb yet, but I will do so very soon.

That’s great for modern browsers, but not everyone has the option to be modern. Sometimes they’re constrained by old operating systems to run older browsers, ones with no service-worker support: a lab full of Windows XP machines limited to IE8, for example. Or on even older machines, running Windows 95 or other operating systems of that era. Those are most likely to be the very people who are in situations where they’re limited to satellite internet or other similarly slow services with unforgiving data caps. Even in the highly-wired world, you can still find older installs of operating systems and browsers: public libraries, to pick but one example. Securing the web literally made it less accessible to many, many people around the world.

Beyond deploying service workers and hoping those struggling to bridge the digital divide make it across, I don’t really have a solution here. I think HTTPS is probably a net positive overall, and I don’t know what we could have done better. All I know is that I saw, first-hand, the negative externality that was pushed onto people far, far away from our data centers and our thoughts.

I’ve always found this whole “https everywhere” silly. Even when I had a web site with content on it, I didn’t want to pay for a certificate, and for https hosting, and I had no content deserving of being encrypted end-to-end.

It doesn’t secure anything to add https to sites which don’t need it. That’s just a cash grab.

It’s a shame we’re right back to JavaScript being so important on every kind of website, although it seems like there are good resources for getting started with Service Workers, at least, e.g: https://github.com/GoogleChromeLabs/sw-toolbox

It’s also a shame that we’re mostly depending on the whims of Google and Facebook to improve network connectivity in these under-served areas

@Seth I. Rich. SSL Certificates are free (from letsencrypt) or from Cloudflare so cost isn’t an issue. There are lots of advantages to using HTTPS and many disadvantages to not using it.

If you don’t secure your site it can be a liability. Do you really want someone injecting scripts, images, or ad content onto your page so that it looks like you put them there? Or changing the words on your page? Or using your site to attack other sites? This stuff happens: on airlines (a lot, and again), in China, even ISPs do it (a lot). And HTTPS prevents all of it. It guarantees content integrity and the ability to detect tampering.
See this site for a good explainer: https://doesmysiteneedhttps.com/

There’s a third option on the horizon here: if we have signed HTTP exchanges, local caches become possible again if you’re willing to give up privacy but not integrity. There are a couple of Internet-Drafts on this topic, but we also need to be able to tell the browsers to look in the cache before they try poking the origin server. And, of course, none of this will help anyone stuck browsing with IE5 on Windows ME.

insecure http was made for cooperatevly sharing information and it is excellent at it. Due to the transfer of money online becoming popular secure https became a good sane default for anyone using money online. Sadly this even further hurts those no-so-well off. There is no reason one should need to have recent crypto installed, an accurate clock, and a reliable connection in order to simply seek static information such that might increase quality of life. Sure, injections are possible, but an open letter delivered is far more useful than a sealed letter lost.

HTTPS is unfortunate, but necessary. And yes, Seth, it is necessary everywhere. It really doesn’t have anything to do with your content; simple sites that don’t require elaborate JavaScript are welcome (now more than ever), but saying “my site is so simple, it doesn’t deserve to be encrypted” doesn’t cut it.

Here’s why. Since the major ISP’s piracy of the Web (not just Net Neutrality, and not just in the U.S.), the ISPs themselves can execute man-in-the-middle attacks to:

insert their own advertising into your site
collect data on what your readers are viewing, and sell that data to whoever or whatever
censor/block/rewrite parts of your site

The first two are happening already; the third may be. If you’re cool with that, OK. I certainly wouldn’t be. I completely agree that this is a harsh penalty to impose on sites that are relatively static and simple or sites that are run by groups that don’t have the technical ability to implement https.

HTTPS is a solution–not a good one, I’ll grant, but I don’t think we have a better one. Well, yes, we do: widespread adoption of IPSec and IPv6. But we can’t wait for that.

There is also another solution to the caching issue, but it does affect the privacy of the users. Since you have access to the local machines, you can install your own certificate authority and man-in-the-middle your users with a caching proxy server. The server makes the connection, strips the security, caches the data, and then presents it to the users with using your own CA.

Harness cURL (or similar software) and download desired websites locally and host the content on a server within the local network.
I am not that well versed in Varnish but I expect it would be possible to set one up on a local network and route all traffic through it so that it returns the cached data to the local network.

Probably not the best solutions, however, the first one would definitely work in the mentioned use case with the second idea being a good candidate for locations with such low issues if they could get their hands on a Raspberry Pi.

I can kind of appreciate whats happening, as a European I’m suddenly now not able to read most of the news sites in the US because of where I live! Also till recently I only had a 2mb pipe as I am in the countryside!
Ok so that is different but it is still debilitating!

I love HTTPS everywhere, and I was really thrilled when responsive web design became a thing.
But both developments have a common disadvantage: They exclude old, less capable software and hardware.
Maybe one should consider going back to something that m.example.com was for a while: A subdomain for low end devices that is lightweight, comes without ton of unnecessary JavaScript garbage, and without that RewriteRule that enforces HTTPS, so that HTTPS remains optional for those who need to access information without the “overhead of security”.

This problem seems to be a special case of HTTP’s inherent centralisation flaw. Usenet or email wouldn’t have had this problem.

So I guess one partial fix is similar. All CDNs have much of the content duplicated around the world. What if there was a dedicated scheme to have shared synchronised local CDN servers, on the network before the satellite hop? So the static data only needs to go over once.

I think Ryan mailed it, they will have to use a proxy server that is a MitM, it takes in https and provides back to the client. That’s sounds dodgy, but it is how a lot of corporate networks function. Each client installs the cert from the proxy, and it can then proxy & cache https traffic. Ok so long as the users aren’t expecting private collections, but in this case that appears ok.

The comments about whether to use https are a tangent, the web is going that way, we need to make sure the solutions to these sorts of problems are better known and easy to use.

We certainly need encrypted transfer – want to enter your credit card details on an open page? Thought not. The last time I had to do that it was being used within hours.

HTTPS doesn’t give quite as much privacy as some people think – the domain name is sent in clear text, necessary because of name-based hosting, where a single server hosts multiple domains. Of course, the actual IP address is also sent in the clear so that intermediate equipment can route the packets, so it’s not that huge a leak, but people forget about both of those facts.

But for sure I agree with you about the problems of performance. Pingdom says this Web page weighs in at only 258KBytes, which is well below average (sigh), and only 12 http(s) requests, loading in well under three seconds. My own Web page – https://www.fromoldbooks.org/ – is much worse (800KBytes, 58 requests, 2 seconds), although as a stock image site it’s image-heavy.

Some of the worst performance culprits seem to be from Google AdSense and Analytics. Moving some of that into the client could make a big difference to the overall Web experience without compromising on privacy, and Brave has taken some steps in that direction – but at the cost of what some in the ad industry feel to be an unacceptable land-grab.

It’d be a good topic for a browser and Web server symposium or Workshop, to try & improve typical/average Web performance by, say, an order of magnitude in a world where privacy and security matter more than ever before.

This is a real problem, and as a firm HTTPS everywhere advocate, I feel it and know we must deal with it. But I think we can.

As other people have mentioned:

CDNs: Sounds like we need more CDN presence closer to Uganda.

BGP fanciness: Sites like Google and Wikipedia should/could get servers closer to Uganda, with better/shorter routes over better links.

Good MITM: The owner/administrator of the clients could choose to trust a cacheing MITM’s certificate issuer.

Modern protocols: H/2 and QUIC aim to do more with fewer packets, reducing handshake overhead and the cost of packet loss in those critical early phases. I expect to see that increasingly pay off (for everyone).

Also, I can’t imagine that large page-loads (unfortunately, the norm) perform well over satellite links even with HTTP. With latency and packet loss that high, TLS is not uniquely/solely a cause of long load times. I don’t see if in your post you discuss what the pre-HTTPS load times were like.

Checking just now, I see CNN International is non-HTTPS, and is very large (1.2 MiB over 207 requests). There’s no way that’s fast; HTTPS is not the special problem there.

You are stuck with HTTPS – this is not a bad thing, though. The solution isn’t to implore we serve traffic over HTTP; rather, it is exactly as you and others wrote – set up your caching proxy so that it is a MITM (man-in-the-middle) to the remote TLS-enabled website, and add your CA public key to the client machines on your network as a trusted authority. (Be sure to protect your private keys.)

I also recommend that you run Pi-hole on your local network, as it will blackhole useless tracking and advertisement-related servers that would squander your precious (and slow) data.

As for running older OSes and browsers, again there is little you can do. I agree with you and other posters here that the size of today’s websites is astounding – hundreds of KBs, sometimes, for CSS and Java frameworks and libraries. And then the whole site must be rendered via JavaScript. I don’t notice these things much, but I have a few ultra-budget computers where these modern design trends are noticeable. Sometimes mobile sites are lighter than desktop versions, and as such, you may be able to use tricks to try to force receiving the mobile version. (Try m.website.com instead of http://www.website.com, for example.)

Fortunately the CSS Grid spec will allow us to remove some of the cruft in websites (mostly responsive/reactive layout JS/CSS frameworks, and reduce div/span nesting), but it’s not going to be a panacea, and certainly not quickly. But it is a ray of hope. (I ran a test on one of my projects; I noted a significant reduction in size from removing Bootstrap and switching everything to the CSS grid – even my own components needed fewer wrappers and wrapper CSS classes.)

I’m not sure if you can use Subresource Integrity to load assets over http from an https site. Integrity would be assured (no one could tamper with the assets) but some privacy would be lost (someone could view which assets you were downloading). In some cases, you could use the SHA to download the files from a local cache without needing to go to the source.

The solution for things like schools on satellite links in remote areas might be to use a caching proxy server that unwraps the HTTPs and serves it re-signed with it’s own certificate. You’d have to manually install that certificate in the computers, and it might not work due to steps browsers have started taking to prevent that kind of thing being done maliciously.

There’s a big exception to this and it’s why the problem is off the radar for most people: you can configure a proxy in the browser and it works great. The only thing which broke is doing do without the consent of the client, which means that the people most affected are also the least prepared to deal with it.

Unfortunately, there’s not really a great way to deal with this which doesn’t open millions of people up to mass surveillance. Google’s QUIC protocol is more tolerant of lackey loss but that is far from something you can rely on now. There are some thoughts about shared caching around SRI but those have significant privacy challenges.

Hi Eric. I appreciate how you have centered the issue of inequitable global access to information over the technical issues here. Thanks. You might be interested in what the mastodon decentralization crowd thinks about SSL in the responses to my sharing your article there. https://octodon.social/@stephen/100509749398969421

You can set up a local caching proxy server which decrypts HTTPS and then re-encrypts it before sending down to the local PC’s – It requires all the local PC’s to trust the caching server by installing it’s certificate as a Trusted Root authority, but without HTTPS you still need to trust the caching server, it’s just not formalized.

Technically yes it’s a man-in-the-middle but I’d expect the kids in ugandan schools would be happy to make that tradeoff. We’re not all Edward Snowden.

The one thing which will break this is HTTP Public Key Pinning – however that got a lot of bad press a while ago as it can be used for evil purposes, so hopefully sites aren’t really doing it any more – See smashing magazine, Scott helme and ZDnet – that last article talks about google deprecating and removing HTTPKP from chrome, so hopefully it won’t be a real-world problem

How hard this is to do? I can’t say as I haven’t done it myself, but if it’s difficult it seems to me like that is a worthy open source project, and potentially far more useful than service workers. Service workers cache, but only on that one PC, which isn’t useful when you’re trying to share bandwidth amongst a school full of students.

There’s definitely a good case in creating proxy servers to cache content. I was considering a move to Rarotonga a few years back, and the internet there was expensive and quite limited, coming via satellite.

What’s I’d consider doing is setting up a squid-cache with SSL-bump. That way you can still be pretty optimal in your use of the scare internet connection and mostly be loading cached resources.

Could you elaborate how service workers could mitigate the problem? Assets downloaded over HTTPS can be locally cached by the browser just like with HTTP. What does caching via service workers offer that normal caching could not?

Apart from this, I think it is clear that in such a situation ad-blocking (ideally already at DNS… e.g. with special hosts file) is almost mandatory.

Isn’t the problem for old systems that protocols like QUIC and HTTP/2 aren’t going to be supported?

When it comes to supporting old and constrained connections, I would bet that the size of images and JS matter far more than the HTTPS overhead. I like the suggestion of keeping optional non-SSL domains around, but the challenge there is making sure users know they’re available.

Also, with the recent(ish) deprecation of TLS 1.0, there are plenty of SSL configurations out there today that just prevent sites from working in old browsers. If you’re using a configuration that relies on Server Name Indication (SNI), you might be blocking even more browsers. Unless I’m misunderstanding, it’s basically impossible to serve the “most” secure configurations to only modern clients – you’re basically shooting for a balance of “secure enough” and “accessible enough”. It’s a terrible trade off, to be sure. :( But I also can’t fault people for making it.

As others have mentioned: The challenge is that the privacy of the content does matter for users, and this includes users on poor connections. It seems there’s not total consensus on protecting the privacy of static content, but I think there’s plenty of cases where it’s easy to understand why privacy desirable. (Think people researching sexual orientations, or pro-democracy info, or some medical info.)

In those cases, I think local caching servers make sense, but you also have to trust whoever is running that cache.

My question is this:
Is there any way for a server to detect early on that it’s coming from a client with low bandwidth? If that’s the case, couldn’t you configure servers to default to serving smaller (maybe mobile) pages? (Similar to detecting the UA in a request…). I guess, you could have a framework that uses the UA to serve lighter pages to older clients, too, with the understanding that an old browser is likely to be running on old hardware.

I still believe we should have a secure communication everywhere; I am not 100% sure if this should be the current HTTPS.

We need secure connection even for public static sites. The reason #1 is not encryption, it is authentication. We do not want infected routers / people with Wi-Fi Pineapple / malicious to ISPs / etc. to modify webpages we see. Without some kind of secure connection, they could for example inject some cryptominers or advertisments or malware. They could also modify the content of static pages to instruct people to do something dangerous, e.g., modify recommended amount of some chemicals.

Do we always want TLS?

The way we secure our communication does not have to be today’s HTTPS, though. Encryption is needed just sometimes. On public static sites, it can kind of obscure what are you looking at (e.g., attacker sees you are looking at Wikipedia, but it is not clear what page), but traffic volume analysis can often distinguish between specific pages.

How to make it better?

Let’s look at some options to make it better. There will be some tradeoffs to privacy, but we will not let attackers to affect traffic in an arbitrary way, as plain HTTP would allow. Thus, we would not make the user more prone to downgrade attacks than with today’s HTTPS. Our main point is allowing the caches doing their jobs, maybe a better one than with the current state of the art HTTP caches can do.

Mixed content secured by SRI

First, we could sometimes achieve a reasonable level of security even with a plain HTTP. We could have loaded some images, stylesheets and even scripts over a plain HTTP, provided they are protected by subresource integrity (SRI). I have wondered why browsers consider even SRI-protected resources as a mixed content. They are protected against modification and they do not necessarily contain anything sensitive. I don’t much need to hide the fact I am downloading jQuery 1.8.1… (Today, such change in browsers can be a bit more complex if it has to be compatible with older browsers with a more strict mixed content policy. It would ideally bring something like allowplain atributte, allowing usage of plain HTTP instead of HTTPS.)

Shared cache based on hashes

With SRI, we could go a bit further. Where explicitly approved by some extra header, the browser could just match the hash for caching purposes, even if it has not ever downloaded the specific URL. As a result, we would not needlessly download dozens of exactly same copies of jQuery or Bootstrap. We could download it just once and then use the cache. While this could serve as some minor side channel that reveals information what files are already in your cache, explicit approval through some header can make it a non-issue.

Serving signed responses from caching proxy

We could also have some caches of some signed (but probably unencrypted) data. This however goes with some privacy tradeoff and new protocol to implement, but it does not give up data authentication. A cache server could return some data with expiration time and signature, even without contacting the upstream server. This is quite more complex, but still technically feasible. We cannot use TLS at this point, because TLS serves for transport layer, which we would like to intercept. The handshake could however start as a standard TLS handshake and continue with a different protocol:

Client: ClientHello, I am trying to connect through TLS to host example.com, there are my capabilities (ciphersuites). I am able to use caching proxy instead of standard TLS.Caching proxy: Hey, I have some content for this server cached. See my non-expired approval from the server, signed by the private key of certificate holder. I am allowed to serve you some of the requests. Plus there is the OCSP response, so you know the server’s certificate is not revoked. You see, the private key holder indicates there is nothing sensitive in the URL, you can send it to me.Client: OK, there is the full URL: htttps://example.com/contactCaching proxy: OK, there are the data authenticated by the server.

If client or server does not support such a feature, either just because it is not implemented or because they don’t want this for a reason, no other party can force the communication to go this way instead of standard TLS.

Website owner agreement is needed: If the proxy does not have a signed and non-expired approval, it cannot force the client to reveal the full URL.
If the browser chooses not to use this way (e.g., because of user’s decision), it can insist on a standard TLS handshake.
Standard TLS handshake can ne required for some blacklisted URLs (e.g., /api/*), POST requests or if some specific cookie is present. Those exceptions could be described in the initial approval.

Cache-friendly version?

I am, however, generally against making special cache-friendly sites, similar to past “wap” or “mobile” versions. If they have a different URL, it gets tricky to handle links. When I click a link from elsewhere, it does not necessarily point to the version I want. Also, force website owner not to use HSTS, which is probably not what we want.

Challenges

UX issues: Maybe just some users will want such tradeoff, while some others will not. How to allow both of them making an informed decision?
None of those suggestions is enough reviewed by others. Furthermore, description of signed caches is quite vague to properly review, because I have prefered to be concise. While I have some security and crypto background, I don’t think this should be implemented without any review.
This would require multiple parties to implement it. All the ideas require some change in browser and the website. The last one also requires important modification of the webserver and proxy. But incentives to implement this can be quite low for most people with fast Internet connection. On the other hand, the SRI enhancements are not so hard (i.e., they are much easier than extending HTTPS to some TLS alternative) and can be useful even in Europe / America on mobile connections, despite there is no proxy that can speed up loading.
Any change in browsers is likely irelevant for people with Windows XP or something similar. On the other hand, they could be welcome anyway if their usage don’t break anything.

[…] 2018Aug08: Sadly, people in remote and underserved locations are having a lot of trouble accessing sites via HTTPS. While that certainly sucks for them, I’m confident that solutions to the specific technical […]

Any widely adopted solution going forward should not and will not compromise integrity assurance. Meanwhile, Service Workers have high overhead and further increase reliance on scripting. My hope lies in the Peer Web bringing federation of static content and allowing for decentralised infrastructure. Specifically, Dat Protocol is an easy way to allow for p2p propagation of your web content. This would make community internet projects resilient and truly independent.

Internet access in remote areas is a real problem, and HTTPS does complicate local caching as the article describes. Othernet (formerly known as Outernet) is trying new approaches for internet content distribution that could avoid these problems. http://othernet.is

Just like Orion Edwards, Uwe Trenkner and Mike Healy commented, there’s something that doesn’t smell right to me about Services Workers.

If you’re trying to setup a shared cache, I don’t think service workers will be the right choice. From what I understand, they’re a mechanism for managing a local cache and custom behaviors on the client, especially useful in case of no or bad connectivity; moreover they only work on a single user agent (a single browser on a single computer) therefore the advantages cannot be shared across a classroom. For this use case, I don’t see them improving the situation much from what properly set HTTP caching headers already offer.

Instead, what comes to mind is IPFS which is a secure way to share resource locally (peer-to-peer) in a secure manner. Maybe this sort of use case is the killer-feature of IPFS. Worth looking into!

Cloudflare’s Keyless SSL project enables caching whilst maintaining the integrity of the certificate served. So (if my theory is right… bear with me) you could install it on the local cache server as a way of ensuring HTTPS is still present, whilst allowing local caching (I think?).

We’ve been advised to send a larger number of smaller packets with HTTP/2 (as opposed to a small number of larger packets with HTTP/1), but does this negatively impact performance when using satellite internet access like in the example you gave?