meek is a transport that uses HTTP for carrying bytes and TLS for obfuscation. Traffic is relayed through a third-party server (​Google App Engine). It uses a trick to talk to the third party so that it looks like it is talking to an unblocked server.

Overview

meek sends traffic through a custom web app on ​Google App Engine. meek works in places where Google search (www.google.com) is unblocked, even though App Engine itself (appspot.com) may be blocked. The meek-client program builds a special HTTPS request and sends it to the Google frontend server--the server that dispatches requests to different Google services, like search, Google Drive, and App Engine. What's special about the HTTPS request is that from the outside it appears to be destined for www.google.com, so it is allowed by the censor. But under the encryption layer, the ​Host header is actually for meek-reflect.appspot.com, so the frontend server knows to forward the request to App Engine and the web app that we run. The web app in turn forwards the traffic to a Tor bridge, where the meek-server program decodes it and feeds it to Tor.

See A Child's Garden of Pluggable Transports for details of how the protocol looks at the byte level, both at the TLS layer (the part visible to a censor), and at the HTTP layer (the invisible layer that carries the data).

Web services

Google App Engine is just one example of a web service that may potentially be used by meek. Here are some web services that support domain fronting.

Google App Engine

​Google App Engine is web application hosting on Google's infrastructure. This is the one that has been deployed so far. The front domain can be any Google domain, as far as I can tell, from www.google.com to www.youtube.com to www.orkut.com.

Amazon CloudFront

​CloudFront is a CDN. Your files are hosted on a generated domain name that looks like d2k1ftgv7pobq7.cloudfront.net. All these domains ​support HTTPS with a wildcard cert for *.cloudfront.net, and they can front for each other.

There is a ​free tier, good for a year, that limits you to 50 GB per month. ​Per-request pricing differs by client country. Per-gigabyte costs go down the more you transfer, with a maximum of $0.19 per gigabyte. Bandwidth costs to the origin server (i.e., the Tor bridge) are lower. There's an additional cost of about $0.01 per 10,000 requests.

CloudFront allows you to use your own TLS domain name for an extra charge, but that appears to put you on a certificate with a bunch of shared SANs, which can't front for domains on different certificates.

Amazon CloudFront supports all files that can be served over HTTP. This includes dynamic web pages, such as HTML or PHP pages, any popular static files that are a part of your web application, such as website images, audio streams, video streams, media files or software downloads. For on-demand media files, you can also choose to stream your content using RTMP delivery. Amazon CloudFront also supports delivery of live media over HTTP.

Does Amazon CloudFront cache POST responses?

Amazon CloudFront does not cache the responses to POST, PUT, DELETE, OPTIONS, and PATCH requests – these requests are proxied back to the origin server.

There's a question of what to use as the front domain. Any particular *.cloudfront.net name could be individually blockable. The generic names cloudfront.net and www.cloudfront.net don't resolve. Maybe pick one with a lot of collateral damage? Or a few, and randomly choose between them? Or connect to an IP, rather than a domain (#12208). Alexa has ​a list of the most popular cloudfront.net domains ("Where do visitors go on cloudfront.net?"), which starts out:

There's a ​list of CNAMES that point to an example cloudfront.net subdomain. It appears that GFW blacklists (through DNS poisoning) *.cloudfront.net, but some names are whitelisted including d3dsacqprgcsqh.cloudfront.net and d1y9yo7q4hy8a7.cloudfront.net (9gag).

CloudFlare

​CloudFlare is a CDN. You use your own domain name. TLS is terminated at CloudFlare's server.

There are different ​pricing plans. The cheapest one that supports SSL is Pro, for $20 per month. Business is $200 and Enterprise averages $5,000.

There's no per-gigabyte bandwidth charge. The ​terms of use suggest that they don't want you using it to serve other than ordinary web sites.

SECTION 10: LIMITATION ON NON-HTML CACHING

You acknowledge that CloudFlare's Service is offered as a platform to cache and serve web pages and websites and is not offered for other purposes, such as remote storage. Accordingly, you understand and agree to use the Service solely for the purpose of hosting and serving web pages as viewed through a web browser or other application and the Hypertext Markup Language (HTML) protocol or other equivalent technology....

It might be easier and cheaper to get Akamai through a reseller. For example ​Liquid Web posts a price list, $100/month for up to 1000 GB. ​This blog post describes how to use WordPress with the Liquid Web CDN. In that example they use a custom CNAME, cdn.lw.rrfaae.com, which for me has the reverse DNS a1711.g1.akamai.net. I can grab an HTTPS version of the blog while fronting through a248.e.akamai.net:

However, the CDN only works for static files hosted through Cloud Files. They ​don't support the "origin pull" service we need.

​HP Cloud uses Akamai. But they have the same problem as Rackspace: it's only static files from HP Cloud Object Storage.

Fastly

​Fastly is a CDN, being used by the meek-like transports of Psiphon and Lantern. It apparently requires you to front without a SNI, only an IP, because their frontend server checks the SNI against the Host, and sends a 400 response if they don't match. Both other projects had to fork an HTTPS library to make it possible.

​Pricing is a minimum $50 per month, and $0.12–0.19 per GB for the first 10 TB per month. There's an additional charge per 10,000 requests.

Level 3

​VPS.NET is a reseller of the Level 3 CDN (formerly they had a deal with Akamai). Pricing is pay-as-you-go, not per-month; in other words we can buy a TB and not pay more until it's used up. The first TB is $35 and after that it's $250.

​CloudVPS is another reseller. There's no extra charge over the normal VPS fee, but they say:

"The maximum free throughput of the CDN is 100 Megabit per second (Mbit/sec). Traffic above 100 Mbit/sec will be billed at our normal traffic pricing. Contact us if you plan to use the CDN for large amounts of traffic."
"The free CloudVPS CDN cannot be used for SSL delivery. Contact us if you want to speed up SSL traffic using the CDN."

Ideas

An idea to reduce overhead and eliminate polling is to use HTTP as a long-lived bidirectional channel, sending upstream data in a POST body and receiving data in the response body simultaneously. (That is, you send a POST with no Content-Length, the server reads your header and forwards the request to the relay, the server writes back a header, and after that you use the connection as an ordinary socket, with upstream and downstream data interleaved.) An implementation of this idea is at https://www.bamsoftware.com/git/meeker.git. The idea doesn't work with App Engine, for two reasons. 1) ​requests must be handled within 60 seconds, and 2) ​App Engine doesn't support streaming requests of this kind:

"App Engine calls the handler with a Request and a ResponseWriter, then waits for the handler to write to the ResponseWriter and return. When the handler returns, the data in the ResponseWriter's internal buffer is sent to the user.
This is practically the same as when writing normal Go programs that use the http package. The one notable difference is that App Engine does not support streaming data in response to a single request."

App Engine doesn't even call your web app code until it has consumed the entire request body, and doesn't start flushing the response body until you close the output stream.

Instead of sending TLS with a front SNI, think about sending TLS with no SNI at all. (It might look like a really old browser or a non-browser daemon or something.) Then the censor doesn't have an SNI to match on, and has the choice of blocking an entire IP address (which may virtually host many domains) instead of a single SNI. This idea could be useful in deployment with a CDN, which though it may have thousands of domains, is blockable if we choose just one of those domains as a front. See #12208.

The App Engine ​Channel API provides a way to have long-lived push connections to the client, subject to a restricted interface. (HTTP handlers are otherwise ​required to finish within 60 seconds.) The client could use HTTP request bodies to send data, and a channel to receive, and remove the need for polling. It would require us to reimplement the ​client JavaScript channel API in order to make use of the particular ​Comet-based protocol.

Paid apps can create outbound sockets. I don't think it helps us because then the web app would be responsible for managing the session id mapping.

​GTor a.k.a. CloudEntry is based on GoAgent and uses App Engine. It ​uses App Engine's socket support to make two outgoing connections from App Engine: one to the relay and one back to the client. For that reason, the client can't be behind NAT (just like with ​flash proxy). The sockets can't live longer than 60 seconds because of App Engine limits, so after that your connection is broken and you have to start again. GTor works as an upstream HTTPProxy for Tor.

​Flashlight from Lantern uses the Host header trick with CloudFlare. Like GoAgent, it uses local MITM for HTTPS connections and makes the actual HTTP requests from the remote server.

Users

Distinguishability

Barriers to indistinguishability

TLS ciphersuites
Look like a browser. #4744 has the story of when tor changed its ciphersuite list to look like Firefox's in 2012. tor's list of ciphers is in ​src/common/ciphers.inc.

TLS extensions
Look like a browser.

Packet lengths
Do something to break up fixed-length cells.

Interpacket times

Upstream/downstream bandwidth ratio

Polling interval
When we have nothing to send, we start polling at 100 ms, and increase the interval by 50% every time no data is received, up to a maximum of 5 s. The growth pattern and the fixed cap is detectable.
Here's what the fixed polling of 5 s looks like in the GNOME system monitor:

Maximum payload lengths
Payloads are limited to 65536 bytes. During bootstrapping and bulk downloads, a lot of bodies have exactly this size.

Behavior on random drops
Suppose the censor drops every hundredth connection to ​https://www.google.com/. Normal web users will just refresh; meek's stream will be broken.

Number of simultaneous requests
Browsers open many parallel connections to the same server; I think meek 0.4 opens just one.

Extra latency
The latency between the client and the front domain is likely to be measurably different from the latency between the client and the real destination.

Working in our advantage is that we are likely to be transporting web traffic, so we inherit some of its traffic characteristics.

How to look like browser HTTPS

We decided to use a browser extension to make all the HTTPS requests required by meek, so that the requests look just like those made by a browser. There's an extension for Firefox (which works with Tor Browser, so it can work in the browser bundle without shipping a second browser) and one for Chrome. The list below is a summary of a discussion that took place ​on the tor-dev mailing list and on IRC.

Use your own HTTPS/TLS library, and take care to make sure your ciphersuites and extensions match those of a browser. There are ​Python bindings for NSS that might make it easier. Chromium is ​moving to OpenSSL in the future.

Use a separate (headless) browser as an instrument for making HTTPS requests. This is what ​htpt plans to do.​PhantomJS is a headless WebKit that is scriptable with JavaScript. Its compressed size is 7–13 MB. ​This postserver.js example shows it running its own web server, which we could use as a means of communication:

​MozRepl (​addons.mozilla.org) gives you a JavaScript REPL that allows you to control the browser. It looks like the in-browser JavaScript console, except accessible from outside. ​Firefox Puppeteer is a fork of MozRepl that is designed for machine-driven browser interaction.
Another option is to write an extension for some other browser and communicate with it using some custom IPC.

Use an ​extension in Tor Browser itself. The plugin bypasses Tor Browser's normal proxy settings in order to issue HTTPS requests directly to the front domain.

Sample client hellos

Here is a diff of the client hellos of Firefox 24.4.0 and Tor Browser with meek-http-helper, a browser extension that proxies the requests of meek-client. The only difference is in the client randomness.

Style guide

The word "meek" should be written in lower case, even when it is the first word in a sentence. Exception: when it is the last word in a sentence, it should instead be written in ALL CAPS. When printed on glossy paper, "meek" should be followed by a ™ symbol; when handwritten, it should be decisively underlined. You may use the abbrevation "M." in order to save space, but the first use in a document should be spelled out in full with the abbreviation in parentheses. Exception: for every use of an abbreviation after the first, if the number of uses so far is the description number of a non-halting Turing machine, then the "M." should be inverted to become a "W."