Cloudbleed Technical Analysis

When Tavis Ormandy from Google’s corpus distillation project started analyzing data from publically available databases on Feb 17th 2017, little did he expect that the corrupted data would be due to vulnerabilities in three features of CloudFlare and would have such a large reach.

The three Cloudflare features which caused this vulnerability are email obfuscation, server-side excludes, and automatic HTTPS rewrites. All three features were using the same HTML parser chain, causing a memory leakage resulting in widespread alarm since not only did the leaked memory contain private information, but search engines worldwide were also caching this data.

What caused Cloudbleed?

One of Cloudflare’s key functions is to parse and modify HTML pages as they pass through their edge servers using their NGNIX modules, including rewriting http to https links, inserting Google Analytics tags, etc. Cloudflare’s NGNIX modules were initially written in Ragel. Ragel is a state machine complier which compiles executable finite state machines from regular languages such as C, C++, ASM, etc. While Ragel is good for parsing data formats, validating user input and lexical analysis of programming languages, over a period of time, this got so complicated and difficult to maintain that CloudFlare decided to move over to a new parser, named cf-html to replace Ragel. Migration to cf-html is not complete and hence Cloudflare has NGNIX modules implemented in both Ragel and cf-html.

Although Cloudflare’s Ragel based parser had the underlying memory leak from the beginning, it wasn’t exposed until cf-html was introduced, since the new parser modified the buffering sufficiently enough to cause the resultant memory leakage. Once the leak was discovered to be due to the activation of cf-html parser, using a feature flag, all the features depending on the activation itself, such as email obfuscation and automatic HTTPS rewrites, were immediately disabled. Since the email obfuscation feature was the primary cause of the memory leak and had been modified on Feb 13th, disabling this resulted in an immediate stop to almost all of the memory leaks. However, the third feature, namely server-side excludes, which also resulted in the memory leak, required a patch to be implemented.

Cloudbleed root cause analysis

The offensive memory leak was caused because of a buffer overrun wherein a pointer jumped past the end of buffer when the end of buffer was being checked using an equality operator.

For a well-formed attribute, the Ragel parser moves to the code inside the @ {} block. For any failure, the $lerr {} block is used, which is essentially a debug log statement.

For the coders, it’s easy to note that the @{} transitions to fgoto, just before the fhold, while the $lerr {} does not. This missing fhold, resulted in the ensuing memory leak.

To further drill down, if a pointer x is pointing to the current character being examined, fhold would be x–, which is an essential piece to the code, since during an error, x would be pointing to the character which caused “script consume_attr” to fail. Additionally, when this error occurs at the end of buffer, x would be beyond the end of the document and thus a subsequent check for the end of buffer would fail, and x would result in a buffer overrun.

The result of the parser running out of characters to parse while consuming an attribute, depends on whether the which buffer is currently being parsed. If the buffer being parsed is not the last buffer, $lerr would not be required, since the parser would not know whether there has been an error or not. If the buffer being parsed is indeed the last buffer, then $lerr is definitely required and executed.

If there were an fhold, x would not be after the end of the document and thus there would not have been a buffer overrun nor the resulting memory leak.

According to Cloudflare’s blog, the entry point to the parsing function is ngx_http_email_parse_email as shown above. Here, x points to the first character of the buffer and xe to the character beyond the end of buffer, and eof would be set to xe if it were the last buffer in the chain, else NULL.

With the Regal parser, the final buffer that contains data would be set to 0, which means that eof would be NULL. Now while trying to consume script consume_attr with an unfinished tag at the end of buffer, $lerr is not executed since the parser assumes that there is more data incoming.

The buffer overrun occurred infrequently since the memory leak occurred only if all of the conditions were true – the final buffer containing data had to end with a malformed script or img tag, the buffer has to be less than 4k in length since otherwise NGNIX would crash, the customer had to either have email obfuscation or automatic HTTPS rewrites/Server side excludes enabled in combination with another Cloudflare feature which uses the Regal parser. With Tavis’s quick notification of the vulnerability to Coudflare and it’s quick resolution, the vulnerability was effectively contained and mitigated within hours, although the leak had been present potentially for five months before resolution.

While the vulnerability was fixed quickly, Enterprise IT security teams should reset affected users’ passwords. Skyhigh customers can audit their own exposure from the Cloud Access Security Broker dashboard. For more information or to schedule a conversation with a technical specialist, please send a note to Resources@skyhighnetworks.com.