I have the following problem:
There are $n$ users and each users fills in three data points per day, this are three integers between say $-10$ and $10$. I have access to the database and hence I am able to modify this data. However, I want to "sign" the data.

How can this be done in such a way that I am unable to change the data without that this can be detected? The users login, so I do have the md5 hash of their password, could I use this in any way to sign the data?

It should also not be possible preferably that I delete everything, change a couple of values and generate the "checks" again.

The threat: Let me make an analogy to what can be the threat (my case is more abstract). Say that people with a schizophrenia would give every day a number about "how psychotic" they are. On the basis of long-term data there could be made conclusions about "how severe" the case is. The users might claim that there has been messed with the data to make their case look more severe (say because of conflict of interest with pharmaceutical companies).

What is the threat being defended against? For example, does the database need to be treated as malicious from the user's POV (ie, the admin himself may be malicious from day 1), or does the admin want to ensure that a hacker can't plant fake database entries that are signed?
–
B-ConAug 24 '12 at 6:35

As a side-note: A plain md5 hash is not suitable for password hashing. Use an appropriate function, such as scrypt, bcrypt or PBKDF2.
–
CodesInChaos♦Aug 24 '12 at 8:22

@B-Con I have added "the threat". CodesInChaos: Thanks, I will look into it.
–
Jonas TeuwenAug 24 '12 at 13:31

2 Answers
2

If the users retrieve data based on the position of the data within the database, the most straight-forward way would be to have the user sign every entry they upload combined with a unique identifier that they can map to the respective database location. Store the (value,identifier,signature) tuple for every value uploaded.

The user is going to have to perform the signing, because otherwise you could forge a signature with whatever key you used on their behalf. If your code derives the user's key to perform the signing, then your code can do whatever it wants with the signing key and the user has no reason to trust the signature. You could have the clients all generate keys (public/private pairs if others need to verify the signature, or symmetric if only they need to verify their signature), store them locally, and use them to sign every value they upload. (Obviously key management then becomes an issue, but that's a different can of worms.)

You need the unique identifier to prevent the database admin from simply copying one (value,signature) record from position $i$ to position $j$ in the database, which would allow them to change the value in position $j$ to any value previously uploaded. The unique identifier and the user's ability to map the identifier to a database location prevents that. The simplest way would be to make the unique value the database position. Format the message $m$ for day $d$ in position $i$ with a function akin to $f(m,d,i) \rightarrow$ "value $m$ is the $i$th value for day $d$" and sign the output of $f$. ($f$ must be one-to-one, otherwise there will be collisions and possible forgeries.)

This requires that the user knows what position in the database they are writing to and reading from for every query, otherwise they can't create or verify the signature. For retrieving data, this would work for queries like "give me value #2 from day #15", but it would not work for queries like "how many times did I upload value 7" since the user has no reason to believe that you are reporting all the entries (skipping entries would be like forging them and replacing them with 0). (Asking for all uploaded values is the same as asking for each numbered value.)

If the user can over-write previously uploaded values then it becomes more complicated since the unique value for each signature must somehow map not just to a database location, but also the version number of the database location. (In other words, you're basically adding a dimension to the database.) Otherwise the DB admin can swap the current value of a database location with a previously deleted value from the same position. Having the client save a timestamp or counter locally to describe the latest version of each database overwrite and incorporating that value into the formatted signed message would solve the problem, but then the client needs to keep their own records of which values have been overridden. Keeping a local database that describes the remote database may not be practical for the users.

This only applies to future uploaded data, there is no way to verify the data that has been uploaded prior to including signing. You may have already forged the data, the user has no way of knowing. If the users want past data to be signed, they should re-upload it with signatures. If they don't have a local copy, you can provide your copy to them for their review and signature, but if they don't trust the data you give them they should probably leave it unsigned since signed data is sometimes an indication that the signer fully endorses what they are signing, and best practice is to not endorse data you don't know that you should.

Yes. Use append-only logs, or a cryptographic timestamping service. These approaches are generally designed to solve the problem you mentioned, namely: detecting tampering with data or post-dating of old data.

See the following resources on this site and our sister site, Security.SE: