Want to know an easy way to get hacked? Leave your sensitive data exposed publicly to the world.

It seems like not a week goes by without another story of a company or government leaking sensitive data on public AWS S3 buckets. S3 is a service that allows anyone to create cloud file storage, and by default is web accessible. Amazon recently made an update to their administration console to indicate if a 'bucket' is public, but there are a lot of historical/misconfigured buckets out there set as public when they should not be.

The issue is certainly not isolated to S3. I was recently evaluating a piece of software for a client and noticed that sensitive data was being exposed though cache files which were publicly accessible / unauthenticated in an open directory on their web server, if you knew where to look. Lucky for them there was no evidence that this data had been accessed, but it made me curious to evaluate the extent of publicly accessible open directories on the web.

The following analysis is more 'back of the envelope' than scientific, but clearly demonstrates that server misconfigurations continue to be one of the biggest threats to security and data leakage. In fact, OWASP recently updated their Top 10 Most Critical Web Application Security Risks, and coming in at #6 are these types of security misconfigurations.

The below statistics were all taken from Google searches. A general search for publicly accessible / indexed Apache directory listings yielded 7,240,000 results. There are a few caveats:

Not every public directory is a misconfiguration. There are legitimate reasons for opening a directory; for example, a file download archive.

Not all public directory listings are harmful even if not intended. For example, an open directory containing public images that appear on a website.

The results found are only those that were indexed on Google. It could be that a site has not been indexed by a search engine and still has publicly accessible listings containing sensitive information. This is one of the first items I check when evaluating a web application.

The 7.2 million number is specific to Apache. For example, the Nginx web server directory listings have less telltale signs which makes them a little more difficult to search for. Although some of the below charts also include Nginx results.

Among the most troubling of the above results are those that expose SQL or other backups. All an attacker needs to do is download the file. There's not any reason to even compromise the site in such a case, as the data is already exposed. Exposed data is not always technical; there are many documents exposed and indexed on Google that are likely confidential.

How to deal with this? If you are a website administrator or web application developer, it's likely a good idea to add -Indexes in your .htaccess. Furthermore, a blank index.html or equivalent in sensitive directories will provide a second layer of safety. It's also critical to store sensitive application data (like backups) outside the webroot.

The #1 take away though is that data can't be hidden on the web. If you place a file in a website directory that's not protected or authenticated in any way, it's public one way or another.