I have been reading some very interestingacademicpapers regarding the ability to search encrypted data without needing to actually decrypt the data. Doesn't this defeat the purpose of encryption in the first place? I can imagine language-specific "brute force" attacks in which I re-assemble the plain text from the encrypted text simply by searching for "interesting" combinations of words.

What is the current practical state of the art in searching encrypted storage and why do we want it? What problem does it actually solve, or are the problems entirely manufactured due to other current environmental conditions (eg. searching sensitive information on the "cloud")?

3 Answers
3

The "safe" versions of searching in encrypted data assume at least one of the following:

the pattern to search for is also encrypted with the same key (or some kind of related key) than the data itself;

the search result ("pattern was found there") is encrypted with the same key (or some kind of related key) than the data itself.

With either of these properties, the search engine does not leak information on the data to someone who would not be able to decrypt it in the first place.

The ultimate goal is to be able to offload the grunt work of searching to a big cloud system, while not needing to trust that system. Fully homomorphic encryption is the generic full-blown solution for offloading any kind of work, and the best currently known solutions for that are utterly unreasonable to apply because the overhead is tremendous (it can be implemented, but there is little use in offloading work to a cloud if the result is something slower than a pocket calculator). Encrypted data searching is a specialization: by restricting ourselves to a specific kind of work to offload (i.e. searching), we hope to find algorithms which are sufficiently lightweight to have a practical application.

To my knowledge, the field has not produced anything practical yet (but there is no intrinsic reason why it could not).

An important thing to remember about searching encrypted data is who the data owner is. In a number of proposals/papers I've seen, the data is owned by the person doing the searching. They are simply utilizing cloud resources to do the searching. This is not true of all schemes, however. For example, SADS.

Another important point to look at when reading these sorts of papers is if it requires a trusted third party (TTP). A TTP could mitigate many attacks that would be possible otherwise (such as brute-forcing documents as you describe).

For the state of the art, you'll need to be more specific about your requirements (who owns the data, who can search the data, is a TTP okay, etc) as schemes can be quite different based on those requirements. In addition to SADS, I'd recommend looking at CryptDB.

You may be interested in looking at encrypted search techniques based on Bloom Filters. In these papers you can find some valuable and practical ideas. They could represent a good and practical approach in quite a few scenarios.