I'd like to encrypt IP addresses in my MySQL database, with the following constraints:

Does not need to be resistant to attackers that can execute queries.

Must be resistant to attackers that have access to files on disk.

Must be able to validate an IP against the encrypted form, to check if it matches.

Checking against /24 wildcards (e.g. 10.20.30.*) would be useful too.

Any form of key or password cannot be entered on system startup, as this will be running on a server in a datacenter, and logging in via SSH every reboot is a pain I can do without.

I'm currently using separate database connections to give better privilege restrictions, with a separate table for passwords. I'm also using mysqli and semi-automated code reviews to ensure there are no SQL injection holes.

With a non-salted (i.e. deterministic) hash mechanism, it would be easy to compute all possible IP hashes. With a salted hash, that problem is reduced, but it's still not much better if an attacker is just trying to break one IP hash. Of course, all of this prevents wildcard searches, too.

Is there a solution to this?

UPDATE:
After some thought, I've come up with the following scheme:

Webapp has a 4096-bit RSA public key embedded in the source, I keep the private key in a TrueCrypt volume on my home machine.

When inserting entries into the logs table, the IP is padded with a random string and encrypted with the public key. This makes decrypting it impossible, and makes the encryption non-deterministic.

The webapp provides an API for exporting the logs, which can only be accessed over SSL by authenticated admin users.

I write an application to use the API to browse logs my home machine, using the private key to decrypt the IP addresses.

When an IP address is banned, the IP is stored in plaintext in the bans table.

4 Answers
4

Public / private key encryption is slow. It's ideal for write it & forget it, but it's not a fast process.

For the sake of speed, consider this: use a randomly generated symmetric key that is chosen when the daemon starts and possibly is rotated out every hourly, daily, weekly, or monthly (depending on your traffic and security needs). Leave that key in memory only and encrypt that symmetric key to a database table with the public key of your keypair. Then you can recover the key, but an attacker can't get it from except by dumping the processes memory. Even then, the attacker can only use the most recent key. You will likely want to use something to link the keys to the data or you'll have to brute-force your list of keys against each record. That recording could be as simple as start / stop time if your logs are timestamped, or sequence numbers, or starting encrypted entries with a key id field, etc.

Regardless of what method you use, pad your IP before encrypting it. I see that since you updated your answer this is mentioned, but it's definitely important for anybody referencing this. That will prevent indexing, but it will also prevent any "pigeonhole" issues and the same IP always having the same encrypted output.

A possible performance optimization. If you need a way to make it more efficient, there are ways to improve the efficiency. For instance, here is one. The application can pick a random symmetric key K, encrypt K under the public key, then encrypt the IP address under K, and store the encrypted IP address and the encrypted value of K in the database.

Later when you export the logs, if you want to ban an IP address, since you have the private key, you can recover the corresponding value of K, then decrypt using K to recover the IP address, and add that IP address in cleartext to the ban list.

How does this improve performance? The key trick is that you can reuse the same value of K for many different IP addresses. When your launches, it can pick K, encrypt it under the public key, and cache the value of K and the encryption of K. Now it can reuse that for many log entries. You can periodically clear the cache and create a new K if you want. (Just make sure to store the cached value of K only in memory, not in the database.) This way, you only have to do a single public-key encryption every so often. Each time you want to log a new IP address, just need to do a symmetric-key encryption of it under K, which should be very efficient.

Why the optimization may be unnecessary. That said, if you choose your public-key encryption algorithm properly, even your basic scheme might already be very efficient. For instance, if you use RSA encryption using public exponent e=3, then encryption is very fast, and the scheme you propose might be plenty fast enough.

Your description is not complete, because you do not tell who/what is going to "validate IP addresses against the encrypted form". From your update, I guess that you want:

to be able to log incoming IP addresses, in a way such that an attacker gaining full read access on the server files would not be able to rebuild the logged IP addresses, but such that an authorized administrator (you) could do it;

to maintain a set of "banned" IP addresses, possibly including ranges, and the server must be able to match an IP address against that set.

Since you (understandably) do not want to type a passphrase or insert a key every time the server restarts, you must assume that an attacker gaining full read access on the complete server files knows everything that the server knows right after a fresh start. In particular, the attacker can then simulate the server on his own machine. Since such a server can decide whether a given IP address is banned or not, the attacker can easily obtain the same information with his simulation (there are only four billions of IPv4 addresses, after all). Corollary: you may as well keep the banned (ranges of) addresses in cleartext, since you cannot make it more secure anyway.

For the logging use, asymmetric encryption looks like the right tool. Proper asymmetric encryption already includes random padding (to avoid exhaustive search on the data itself -- there really are only four billions or so of IPv4 addresses). RSA-4096 is complete overkill, though; 1024 bits would be enough for low-value information such as IP addresses (see this answer for details). Yet again, RSA-1024 means that each encrypted IP address will use 128 bytes, and you might want to optimize things a bit, to save space.

You could, for instance, do hybrid encryption with Elliptic Curve Diffie-Hellman: to encrypt an IPv4 address, do an ECDH with regards to a public key (same principle than your RSA key: the administrator alone keeps the private key), and use the resulting shared secret to encrypt the IPv4 address. Use Format Preserving Encryption to encrypt the IPv4 address into four bytes (e.g. with the Thorp shuffle, or, simpler, with a stream cipher initialized with the DH key). For "128-bit security", you need a 256-bit elliptic curve, and an ECDH public key can then be encoded over 32 bytes. Together with the encrypted IP address, that's 36 bytes per address.

To reduce it further, apply @D.W.'s advice: do the ECDH thing once upon server start, and mutualize the shared key usage. You would have a table of ECDH public keys, which grows slowly (one new entry per reboot), and each encrypted IP address would just reference the key to use (a 32-bit field would be sufficient). Note, though, that in this case, you must take care to randomize the symmetric encryption of the IP addresses. For instance, pad the IP address (4 bytes) with 12 bytes of random, and encrypt the 16 bytes as one block with AES. Together with the public key reference, this lower cost downs to about 20 bytes per IP. Without randomization, an attacker could deduce which IP addresses are repeated through the logs. Since this scheme reuses the symmetric key, the encryption must be done with more care, too (e.g. a stream cipher like RC4 would become a very bad idea).

About CPU performance: as @D.W. points out, this is a non-issue. With ECDH, you are below 0.1ms per IP address. RSA encryption will be even below such costs. Which is why I concentrate on space performance above, not CPU.

There are 256^4 possible IPv4 addresses. If you use an algorithm that do not produce any collisions (one ip produces an unique hash), you can relate this to the Pigeonhole principle where you have the same amount or more pigeon holes (hashes) than pigeons (IPs). You would typically have >256^4 pigeonholes to store the encrypted IP addresses with an hash algorithm that produce no collisions.

An attacker could easily perform a bruteforce search against your database to see which hash who correspond to each ip address, due to the fact that a one-to-one hash would be easy to find with the low entropy of 256^4 possible IP addresses (even with multiple iterations of algorithms). If it is not required to be able to match one ip against the database with 100% accurancy, an approach could be to reduce the amount of pigeon holes. This will also introduce hash collisions to your chosen algorithm.

So, if you use a hash size of i.e. 256^3 (3 bytes), each ip-address will have 256 possible collisions.

You did not mention the possibility of false collisions in your list of constraints, so I do not know if this was useful to you.

Another approach that you should consider is the use of full disk encryption with the use of remote authentication server. I believe you will find a solution using pre-boot disk encryption such as PBConnex sufficient if you use windows server. I do not currently remember the name of the linux solution, but I know there is at least one.

Collisions would be bad, since I want to be able to ban IP addresses. Disk encryption is good for protecting against attackers at the physical level, but not much use for attackers that can get access to files via a vulnerability in a webapp. It's also a pain to deal with when I've only got remote access via SSH.
–
PolynomialApr 20 '12 at 17:46

I've updated with an idea on how I might achieve it, feel free to comment on it or update your answer.
–
PolynomialApr 20 '12 at 17:56