CipherCloud claims to support , among other things, searchable encryption. A bunch of speculation seems to suggest they did this via some breathtakingly incompetent means( unfortunately such speculation "appears" to be copyrighted)

Regardless of their actual methodology, if we assume their encrypted data is searchable by the cloud providers they work with without changes to the providers(e.g. salesforce), then doesn't this result for order preserving encryption constitute a lower bound on how insecure their system could be(and a pretty poor one at that)?

The result deals with the security of the ideal functionality of an order preserving encryption (OPE) scheme. This is the ideal model of a scheme where you can sort $c=enc_k(m)$ by numerical comparison of $c$. Any scheme where you search encrypted cipher texts with existing queries in a database must meet this requirement. This appears to suggest that the absolute best case for CipherCloud is that their encryption leaks roughly the $1/2m$ high-order bits of a given message* . This seems drastically unsafe for low entropy messages such as social security numbers, credit card numbers, earnings reports, and most other data you might want to search on in say a sales application.

*Per the paper, where M is the size of the message space(i.e. $M=2^l$ for $l$ length messages) "Intuitively, Theorem 4.2 implies that for $r\approx b \sqrt{M}$, where b is a large enough constant (say $b \ge 8$),there exists an adversary $A$ whose r-window one-wayness is very close to 1."

They claim that their encryption is not deterministic and defeats frequency analysis. Which is a really weird claim and doesn't fit the encryption shown in Sid's screenshot.
–
CodesInChaos♦Apr 20 '13 at 17:14

@CodesInChaos . That is a strange claim. It also seems incompatible with drop in usage with third parties.
–
imichaelmiersApr 20 '13 at 17:19

Which is, of course, my contention: what they can actually possibly do given known techniques is very limited.
–
imichaelmiersApr 20 '13 at 17:46

@RickyDemer I believe that is actually correct. I was quickly going through the paper to confirm what I got from a talk on it and though I must have miss heard when the author said it was $.5m$
–
imichaelmiersApr 20 '13 at 20:13

2 Answers
2

That paper refers to the numerical ordering, while what would be relevant for searchable encryption without changes to the providers is the substring ordering. $\:$ (Thus, their bound does not apply.)
What do you mean by "existing queries in a database"?

If your assumption holds, then
{
Encryption must be deterministic (given the key), since different encryptions of the
same message (with the same key) must be substrings of each other (and thus equal).
There must be a noticable $\big($though not necessarily $\theta\hspace{.01 in}(1)\big)$ chance of breaking authenticity via a
single ciphertext only (by submitting a random substring of it). $\:$ The scheme must be malleable by
almost the same amount, again with just a single ciphertext only and for basically the same reason.
}

"without changes to the providers" is a rather stiff assumption. $\:$ For example, one could easily shut
down active attacks by splitting the ciphertexts into two columns and having the cloud just search the
left column, while the right column has message authentication codes which are only relevant to the user.

I was assuming they weren't modifying anything about the third party provides they worked with. For example, Sale's Force.com's database, schema, and queries remain unchanged. Seems like a reasonable assumption from their site.
–
imichaelmiersApr 20 '13 at 21:19

Also, I was under the impression they did both substring and ordered searching.
–
imichaelmiersApr 20 '13 at 21:26

Interesting discussion here. Is there really much of a difference between numerical ordering and alphabetic sorting? They would appear very similar except the first is base 10 and the other is at a minimum base 26 (a-z), more likely base 36 (a-z and 0-9), and potentially base 62 (a-z, A-Z, and 0-9). Given previous analysis in this forum provided some evidence that all text is converted to lowercase before encrypting, as well as Ciphercloud's (limited) public claims on how they work, the latter option appears to be unlikely.

If we assume that we are talking about Salesforce.com, then most of the plaintext values are names and common words, and that would preclude having a lot of numeric characters interspersed with the alphabetic ones. That means we are really dealing with a 26 character set. And, it's not quite a full base 26 set because unlike numeric values, where any value can follow another (with perhaps the exception of a number following a leading 0), alphabetics are bound by language constraints that dictate improbable pairings, such as qt, and longer sequences such as szyfrowania (which is fine in Polish but not allowed in English!) There are also currency, date, and numeric string values but we will ignore the sorting of those values for this analysis. (There really should be a separate thread on date encryption because if encrypted dates are sortable, I have to believe that the solution offers a very low level of security, if any at all.)

So, if you need to provide a sort capability on the data, you'd need to preserve the order of aa, ab, ac, and other allowable sequential character patterns in the ciphertext. You could also have more random ciphertext and just have something - perhaps this is the "metadata"? - attached to the ciphertext. Because the SaaS app sorts on the string from left to right, the metadata would have to be a prefix of some sort). I'm not sure what if this is how (or IF) Ciperhcloud uses sortable metadata but sort ciphers are inherently weak. the bottom line is that if you preserve sorting in such a way that the SaaS application (Salesforce in this case) can properly sort into a depth of something like 6 characters, then you really are just exposing the entire first 6 characters of every string, and of course that exposes the entire string for anything less than or equal to 6 characters in length.

Once you have the first 6 characters of every string, it should be a simple matter of pattern matching to derive the data. There would be no need to crack the cipher at all at this point.

I'm sure there is a way to quantify the risks of the sort depth, based on a subset of a full base 26 set, but that is for the folks who also know how to get theta and square root characters (am I fated to simply use ^.5 ?) to come out of their keyboards!