willyyam has asked for the
wisdom of the Perl Monks concerning the following question:

I have to collect a unique, important number from a number of people as part of a larger study. Think Social Security number, but not that. I need the actual number at a later point in the process, but I would like to avoid ever keeping this number, even encrypted, on the laptops which will be used to record the data.

My thought was to use a salted hash for storing the numbers, and then generate a rainbow table to reverse that hash, but keep the rainbow table under lock and key (of course).

Generating the rainbow table is pretty trivial, but making it useful so that the numbers can be retrieved is a more thorny problem. If I build the table using sqlite it will be around 830 gigabytes. So, I would get a 1 terabyte external drive and build it there. This limits filesystem options, but should be workable.

So, the questions:

Is this worth the effort?

Is sqlite the right DB for the job?

I have ways of keeping the salt secret - are there other major flaws in the plan?

Thanks!

Comment on
Creating a rainbow table for a ten-digit number: Advice?
Replies are listed 'Best First'.

I've two observations. The first is that I don't have a good understanding of your threat model. Either you have not explained it very well or I'm being dim. I understand that you are attempting to protect a collection of 10-digit numbers but I do not understand the nature of the attackers against whom you are attempting to protect your information. Without a good threat model, evaluating security is almost meaningless.

The second is that this scenario seems to cry out for public key crypto. Issue all your data collection laptps with a RSA public key. Extend the data with a random sequence of digits so that exhaustive search with work function of 1e10 can't decrypt the data. Keep the private key secret. Anyone who can break kilobit RSA undoubtedly has the capability to break your security by other, cheaper, means.

Regarding the first observation, the real threat is the appearance of not taking the collection and retention of the data seriously.

Regarding the second observation, I would love to know how to incorporate public key crypto into a database front-end - specifically one made in MS Access. That would be the preferred option to a hash+rainbow table by a long chalk.

You have a secret database that will give anyone the means to reverse the hash to the original. Then why make the fuss with the rainbow table? Just store the relation 'hash number'->'important number' directly into that database. In both cases you are only safe as long as the secret database stays secret

Um, no, the rainbow table is the secret database, which I will keep secret. A flat "table" isn't useful, since I need a way to take an arbitrary hash and return the related number. That's why the table needs to be a DB.

Either can be compromised, no matter how hard you try to "keep secret." In fact, discussions of the rainbow table often include a note that such entities are used to make recovering "secret" data easier:

Wikipedia: "A common application is to make attacks against hashed passwords feasible."

XXX (name deleted): "XXX is a Windows Password cracker based on Rainbow Tables"

Random discussion of web security: "Statistically, half of the key is found on average as soon as half the chain length has been reached" (caveats re complexity omitted)

I should have formulated that better, my previous post can be misunderstood quite easily. Luckily ww said it much better.

But you are right, my answer isn't really what you were looking for. You seem to need to collect the data over a long time on laptops and the secret database should not be connected while this is happening, right?

Your scheme is a really nice idea, but has problems. Let me elaborate:

Lets say you have chosen a salt. Nobody else can construct a rainbow table in a sensible timeframe without knowing the salt.

But an attacker needs to know only one of those unique numbers you want to keep secret and access to your data on the laptop to find out the salt. He just encrypts the number he knows combined with possible salt values until he finds a encrypted number where there is a corresponding data set

So you need to use a really big salt, more like a password

That salt/password could be stored on the laptop, but then an attacker could just look into your script to find out the password

So you and the data collectors have to type in the salt/password every time they want to collect data. If the attacker gets hold of the laptop he can change the script to store the salt and send it to him or he can collect it later. Granted that is difficult but you still need to secure the laptops more than you might want to. And you have to trust the data collectors

So Xilmans idea to use public-key encryption is really the solution you are looking for with none of the above disadvantages

So you're going to rely on the data-collectors' memories to enter and re-enter the salt reliably hundreds or thousands of times (without reminders on yellow stickies "hidden" somewhere accessible)?

And the salt will also be long enough to make deducing it from data available on the laptop(s) difficult enough that it will pass muster with your (hypothetical?) non-naive, non-ignorant ethics committee?

Do you see any conflict among these notions?

Was it the Red Queen or a Philadelphia comedian who "practice(d) believing two mutually contradictory notions before breakfast every day?"Neither?Well, how about Orwell's "doublethink?"