Defining tokenization

By Luther Martin — January 4, 2010

Tokenization and encryption are the two technologies that are most commonly used to protect sensitive cardholder data like the PCI DSS requires. Encryption is very well defined and understood, but tokenization isn’t. Exactly what is tokenization? Here’s my definition. You can read more about this, including why I believe that this definition makes sense and how this definition compares to the definition of encryption, in "Defining Tokenization and the Security Provides" this month’s ISSA Journal.

A tokenization scheme comprises two stateful, deterministic algorithms: tokenize and detokenize. These operate on two strings called a plaintext and a token. The tokenize algorithm produces a token from a plaintext. The detokenize algorithm produces a plaintext from a token that has already been created by the tokenize algorithm.

A secure tokenization scheme is one in which the mutual information between a plaintext and the token that the tokenize algorithm creates from it is zero.

By this definition, a one-up counter would be secure, because the mutual information between the token and the plaintext is zero. A random value would also be secure. The tokenization product that Voltage sells uses a FIPS-validated PRNG to create tokens. As to what other people are using, that’s a tough one. Tokenization vendors typically keep all the workings of their systems proprietary, so it’s not at all clear how they create tokens.

Greg

Detokenization is typically done by a database lookup. When a token is created, an encrypted copy of the plaintext is archived along with the token. Then to detokenize, you lookup the ciphertext that corresponds to the token, decrypt the ciphertext, and provide the decrypted plaintext to the requesting application.