(Technology) Google and the bad guys

Bad guys know about the "intitle" operator, but they know something else that makes it even more powerful. Often Web servers are left configured to list the contents of directories if there is no default Web page in those directories; on top of that, those directories often contain lots of stuff that the website owners don't actually want to be on the Web. That makes such directory lists prime targets for snoopers. The title of these directory listings almost always start with "Index of", so let's try a new query that I guarantee will generate results that should make you sit up and worry: "intitle:"index of" site:edu password". 2,940 results, and many, if not most, would be completely useless to a potential attacker. Many, however, would yield passwords in plain text, while others could be cracked using common tools like Crack and John the Ripper.

There are other operators, but these should be enough to make the picture clear. Once you start to think about it, the potentially troublesome words and phrases that can be searched for and leveraged should begin to multiply in your mind: passwd. htpasswd. accounts. users.pwd. web_store.cgi. finances. admin. secret. fpadmin.htm. credit card. ssn. And so on.

Googledorklists words and phrases that reveal sensitive information and vulnerabilities

We have two seemingly opposite problems at work here: simplicity and complexity. On the one hand, it has become very easy for non-technical users to post content onto Web servers, sometimes without realizing that they're in fact placing that content on a Web server. It has even become easier to Web-enable databases, which has led in one case to the exposure of a database containing the records of a medical college's patients (and by the way, the search terms discussed in that article are still very much active at Google, one year later).

Even when people do understand that their content is about to go onto the Web, many do not fully think through what they're about to post. They don't examine that content in light of a few simple questions: How could this information be used against me? Or my organisation? And should this even go on the Web in the first place?