Cryptology is such a broad subject that even experienced coders will almost always make mistakes the first few times around. However encryption is such an important topic, often we can't afford to have these mistakes.

The intent of this question is to identify and list what not to do with a given algorithm or API. This way we can learn from other's experiences and prevent the spread of bad practices.

To keep this question constructive, please

Include a "wrong" example

Explain what is wrong with that example

Provide a correct implementation (if applicable).

To the best of your ability, provide references regarding #2 and #3 above.

This question exists because it has historical significance, but it is not considered a good, on-topic question for this site, so please do not use it as evidence that you can ask similar questions here. This question and its answers are frozen and cannot be changed. More info: help center.

1

The most common errors are not errors in the code, but rather misconceptions about how to use cryptography. In other words, the developer would probably make the same mistake in any language. Therefore, I recommend broadening the question so that it's not so focused on code; most errors are conceptual errors, not coding flaws.
–
D.W.Feb 20 '11 at 3:46

Even though an answer is accepted, do continue to add lessons learned. At the very least it will be educational.
–
LamonteCristoFeb 23 '11 at 18:19

21 Answers
21

Don't invent your own encryption algorithm or protocol; that is extremely error-prone. As Bruce Schneier likes to say,

"Anyone can invent an encryption algorithm they themselves can't break; it's much harder to invent one that no one else can break".

Crypto algorithms are very intricate and need intensive vetting to be sure they are secure; if you invent your own, you won't get that, and it's very easy to end up with something insecure without realizing it.

Instead, use a standard cryptographic algorithm and protocol. Odds are that someone else has encountered your problem before and designed an appropriate algorithm for that purpose.

Your best case is to use a high-level well-vetted scheme: for communication security, use TLS (or SSL); for data at rest, use GPG (or PGP). If you can't do that, use a high-level crypto library, like cryptlib, GPGME, Keyczar, or NaCL, instead of a low-level one, like OpenSSL, CryptoAPI, JCE, etc.. Thanks to Nate Lawson for this suggestion.

Actually, this should be rule number 1 - which would invalidate all our other rules. There are probably only a few hundred people in the world who should be designing or implementing crypto. The rest of us should just use their (sane) API.
–
Alex HolstFeb 20 '11 at 17:15

What is the best option for a .NET developer? Any tutorials or examples? It is tough for a non crypto-expert to determine mis-information from what's valid.
–
LamonteCristoFeb 21 '11 at 17:28

It is especially tempting to the creative engineer who has solved hard problems in the past to leave the crypto to someone else. There are still plenty of hard and instresting problems to solve outside of crypto. Look at the examples of people who have made critical mistakes in crypto and solve something else.
–
this.joshJun 17 '11 at 4:26

5

+1 The Crypto API offers too much flexibility that can get the layman developer into trouble. There are many samples on the internet (and MSFT's support site) that violate at least one of the lessons learned on this page. Some developers forget to consider things like how the keys are exchanged, validated, and revoked. This is where things get thorny. Where are the keys held? How are keys published? How are keys validated? How is key rotation done? Even if the developer is lucky enough to get the math right, or combination of features (CBC, Block, stream, etc), the protocol may be broken.
–
LamonteCristoApr 18 '12 at 16:07

It is a very common error to encrypt data without also authenticating it.

Example: The developer wants to keep a message secret, so encrypts the message with AES-CBC mode. The error: This is not sufficient for security in the presence of active attacks, replay attacks, reaction attacks, etc. There are known attacks on encryption without message authentication, and the attacks can be quite serious. The fix is to add message authentication.

To avoid these problems, you need to use message authentication every time you apply encryption. You have two choices for how to do that:

Probably the simplest solution is to use an encryption scheme that provides authenticated encryption, e.g.., GCM, CWC, EAX, CCM, OCB. (See also: 1.) The authenticated encryption scheme handles this for you, so you don't have to think about it.

Alternatively, you can apply your own message authentication, as follows. First, encrypt the message using an appropriate symmetric-key encryption scheme (e.g., AES-CBC). Then, take the entire ciphertext (including any IVs, nonces, or other values needed for decryption), apply a message authentication code (e.g., AES-CMAC, SHA1-HMAC, SHA256-HMAC), and append the resulting MAC digest to the ciphertext before transmission. On the receiving side, check that the MAC digest is valid before decrypting. This is known as the encrypt-then-authenticate construction. (See also: 1, 2.) This also works fine, but requires a little more care from you.

@makerofthings7: GCM encryption is included in the Oracle provider within Java 7. It is also proposed for TLS (within an RFC) and XML encryption v1.1. The Bouncy implementation is compatible with the one within the Sun provider (except for differences regarding authenticated data and the exact exception thrown).
–
Maarten BodewesMar 2 '12 at 20:48

4

For those who are new to cryptography (hence reading this post) "Authentication" has nothing to do with "Logging in" or using your credentials. It's something like a checksum. In reality it is the combination of math and processes that ultimately does much much more than simply checksum the data. (@D.W. What do you think of this layman explanation?)
–
LamonteCristoApr 18 '12 at 15:50

@makerofthings7, great explanation! Perhaps it'd be clearer if the article referred to "message authentication" rather than just generic authentication. I'll make that change now.
–
D.W.Apr 18 '12 at 17:40

An error I sometimes see: People want a hash of the strings S and T. They concatenate them to get a single string S||T, then hash it to get H(S||T). This is flawed.

The problem: Concatenation leaves the boundary between the two strings ambiguous. Example: builtin||securely = built||insecurely. Put another way, the hash H(S||T) does not uniquely identify the string S and T. Therefore, the attacker may be able to change the boundary between the two strings, without changing the hash. For instance, if Alice wanted to send the two strings builtin and securely, the attacker could change them to the two strings built and insecurely without invalidating the hash.

Similar problems apply when applying a digital signature or message authentication code to a concatenation of strings.

The fix: rather than plain concatenation, use some encoding that is unambiguously decodeable. For instance, instead of computing H(S||T), you could compute H(length(S)||S||T), where length(S) is a 32-bit value denoting the length of S in bytes. Or, another possibility is to use H(H(S)||H(T)), or even H(H(S)||T).

I typically throw HMAC at it. Slightly more expensive, but at least I don't need to implement it myself.
–
CodesInChaosApr 18 '12 at 21:26

1

@CodeInChaos, these problems apply equally to HMAC, too. HMAC does nothing to help if you concatenate several strings before feeding them to HMAC.
–
D.W.Apr 19 '12 at 18:07

1

@CodeInChaos, well, use whatever works for you, if you are confident enough in your cryptographic skills to avert flaws. Personally, I wouldn't recommend that approach to others. (1) That's not what HMAC was designed for, so if it happens to be secure, you "got lucky". (2) That's limited to the case of two fields. If you have three fields, you have to do something more complex. So might as well use a proper defense from the start, such as using an unambiguously decodeable encoding (e.g., prepending length before each field to be concatenated).
–
D.W.Apr 19 '12 at 18:49

7

@Marcin "hashes supposed to be slow" no, they are not supposed to be slow, and they are not slow, either
–
curiousguyJun 25 '12 at 19:48

1

@Matt, yup, that can work! However, you'll have to escape any instances of the delimiter in these strings (otherwise, inevitably, someone will type in a string containing the delimiter, and then this approach falls apart). That adds complexity not present in the alternatives. And if you recommend this to others, inevitably one of your audience will forget to escape delimiters. If they forget to escape the delimiter, this scheme fails quietly: it's insecure, but they probably won't notice that in normal operation. So, yup, it works -- but it might not be my first choice.
–
D.W.Aug 27 '12 at 15:52

Make sure you use crypto-strength pseudorandom number generators for things like generating keys, choosing IVs/nonces, etc. Don't use rand(), random(), drand48(), etc.

Make sure you seed the pseudorandom number generator with enough entropy. Don't seed it with the time of day; that's guessable.

Examples: srand(time(NULL)) is very bad. A good way to seed your PRNG is to grab 128 bits or true-random numbers, e.g., from /dev/urandom, CryptGenRandom, or similar. In Java, use SecureRandom, not Random. In .NET, use System.Security.Cryptography.RandomNumberGenerator, not System.Random. In Python, use random.SystemRandom, not random. Thanks to Nate Lawson for some examples.

I remember learning Basic on my Apple ][e. I was writing a game and needed some random input so I used RND(1). I had to keep rebooting to debug my game and I noticed that the random element always went in the same sequence after boot. It was then I learned about pseudorandom number generators. If you need some random seeds, Random.org offers free random number generation based on atmosphereic noise.
–
this.joshJun 17 '11 at 4:45

4

Random.org is best for simulation and other non-security purposes. Random.org is not a good basis for a seed for a cryptographic PRNG, because you can't trust it is unknown to others.
–
D.W.Jun 19 '11 at 6:24

Many modes of operation require an IV (Initialization Vector). You must never re-use the same value for an IV twice; doing so can cancel all the security guarantees and cause a catastrophic breach of security.

For stream cipher modes of operation, like CTR mode or OFB mode, re-using a IV is a security disaster. It can cause the encrypted messages to be trivially recoverable.

For other modes of operation, like CBC mode, re-using an IV can also facilitate plaintext-recovery attacks in some cases.

No matter what mode of operation you use, you shouldn't reuse the IV. If you're wondering how to do it right, the NIST specification provides detailed documentation of how to use block cipher modes of operation properly.

The Tarsnap project provides a good example of this pitfall. Tarsnap encrypts backup data by dividing it into chunks and then encrypting each chunk with AES in CTR mode. In versions 1.0.22 through 1.0.27 of Tarsnap, the same IV was inadvertently re-used, enabling plaintext recovery.

How did this happen? In order to simplify the Tarsnap code — and in the hopes of reducing the potential for bugs — Colin Percival took the opportunity to "refactor" the AES-CTR code into a new file (lib/crypto/crypto_aesctr.c in the Tarsnap source code) and modified the existing places where AES-CTR was used to take advantage of these routines. The new code looks like this:

During the refactoring, the encr_aes->nonce++ inadvertently got turned into encr_aes->nonce, and as a result the same nonce value was used repeatedly. In particular, the CTR nonce value is not incremented after each chunk is encrypted. (The CTR counter is correctly incremented after each 16 bytes of data was processed, but this counter is reset to zero for each new chunk.) Full details are described by Colin Percival in: http://www.daemonology.net/blog/2011-01-18-tarsnap-critical-security-bug.html

Since the issue is centered around a nonce, try titling this answer to that effect (+1); Q: is a nonce usually++, or should it be random?
–
LamonteCristoFeb 19 '11 at 21:04

2

@makerofthings, it depends upon the algorithm. Some algorithms and modes of operation require random nonces (e.g., CBC mode); others only require the nonces to be distinct, and thus a counter suffices (e.g., CTR mode). Hopefully, the specification for algorithm/mode of operation would describe what is required.
–
D.W.Feb 20 '11 at 9:28

I suggest the answer be edited to include IVs as well. Correct IV / nonce usage are very similar ideas.
–
B-ConAug 13 '12 at 20:42

Nonce is N once. Yes, use the variable N only once. Singular. Do not repeat. The strength of the associated algorithm is hurt by repeating N.
–
this.joshAug 30 '12 at 5:41

Here is a wrong example: WEP implemented RC4 with a 24 bit nonce that increases after each message. This introduced two issues: (1) after 2^24 packets were sent, nonces were reused. (2) RC4 wasn't designed to have nonces "closely related" where it is known that each subsequent cipher was ++ the value of the previous.
–
LamonteCristoMay 20 '13 at 14:47

Don't use the same key for both encryption and authentication. Don't use the same key for both encryption and signing.

A key should not be reused for multiple purposes; that may open up various subtle attacks.

For instance, if you have an RSA private/public key pair, you should not both use it for encryption (encrypt with the public key, decrypt with the private key) and for signing (sign with the private key, verify with the public key): pick a single purpose and use it for just that one purpose. If you need both abilities, generate two keypairs, one for signing and one for encryption/decryption.

Similarly, with symmetric cryptography, you should use one key for encryption and a separate independent key for message authentication. Don't re-use the same key for both purposes.

Does s/MIME operate against this recommendation? AFAIK I only have one key and I have the ability to sign and encrypt.
–
LamonteCristoFeb 20 '11 at 16:43

7

IMHO the biggest issue with using the same keys for both usages comes from law enforcement issues. In many jurisdictions you can now be asked to surrender your encryption keys, and that would in practice mean they can sign in your name.
–
Bruno RohéeJun 30 '11 at 8:54

Yeah, a lot of algorithms use the same key pair for signing and encryption, PGP and S/MIME being the obvious examples. It's not necessarily a mathematical problem.
–
ewanm89Apr 18 '12 at 21:40

4

PGP does not use the same key pair for signing and encryption. Rather, a PGP private key is composed of a main key, used for signing, and one or more subkeys, used for encryption. The subkeys are hidden from the user, hence the confusion, but you can view them using gpg --list-secret-keys.
–
FlimmDec 8 '12 at 20:42

The good folks at Microsoft sent me the following code to correct that KB article linked above. This is referenced in case# 111021973179005

This sample code is using AES to encrypt data, and the key for the AES encryption is the hash code generated by SHA256. AES is the Advanced Encryption Standard (AES) algorithm. The AES algorithm is based on permutations and substitutions. Permutations are rearrangements of data, and substitutions replace one unit of data with another. AES performs permutations and substitutions using several different techniques. For more details of AES, please refer to the article “Keep Your Data Secure with the New Advanced Encryption Standard” on MSDN Magazine at http://msdn.microsoft.com/en-us/magazine/cc164055.aspx .

The default value of the mode for operation of the symmetric algorithm for AesCryptoServiceProvider is CBC. CBC is the Cipher Block Chaining mode. It introduces feedback. Before each plain text block is encrypted, it is combined with the cipher text of the previous block by a bitwise exclusive OR operation. This ensures that even if the plain text contains many identical blocks, they will each encrypt to a different cipher text block. The initialization vector is combined with the first plain text block by a bitwise exclusive OR operation before the block is encrypted. If a single bit of the cipher text block is mangled, the corresponding plain text block will also be mangled. In addition, a bit in the subsequent block, in the same position as the original mangled bit, will be mangled. For more detailed information about CipherMode, please refer to http://msdn.microsoft.com/en-us/library/system.security.cryptography.ciphermode.aspx .

I suggest deleting everything after "The right way". Coda Hale's proposal has a number of weaknesses. It makes several errors I've documented in other answers here: it uses encryption without message authentication (a serious flaw), it creates the key as the hash of a password (a serious flaw), it makes no attempt to slow down exhaustive keysearch (another serious flaw). My recommendation for the right way to handle this is described in the last paragraph of my answer titled "Don't roll your own crypto".
–
D.W.Feb 20 '11 at 9:34

I suggest deleting all of the stuff after "Disclaimer" and "some highlights". I think most of them are not relevant to your high-level point about avoiding ECB, and are a distraction. Be concise. Instead, I'd suggest that your advice for the right way should be: Use a secure mode of operation, such as CBC mode or CTR mode. Don't forget to follow the other advice on this page, including using message authentication, generating keys appropriately, etc. If you want to make this a community wiki I'd be happy to edit this answer accordingly.
–
D.W.Feb 20 '11 at 9:36

@D.W. Yes feel free to edit any or all answers as CW. My thought against making the entire question CW is to incentivize posters, but I'll leave that decision to you folks. I just want to learn the right stuff and unlearn the bad practices
–
LamonteCristoFeb 20 '11 at 15:30

2

Note that this error is even easier to make in Java, which uses <cipher>/ECB/PKCS5Padding as default (e.g. Cipher.getInstance("AES")), and if you switch to CBC, it uses a zeroed out IV (more or less a NONCE, see the answer about that) by default as well.
–
Maarten BodewesMar 2 '12 at 20:53

1

@Matt, that image is the result of a successful attack. Given an file named TUX.BMP, assume the attacker will try to look at it but discovers it's encrypted. He then views the encrypted bytes, and upon seeing a non-random pattern he suspects CBC. He then replaces the first couple blocks with a known good BMP file header, and tweaks it until the rows and columns line up. I saw a researcher use a tool to do this at Blackhat a few years ago, I think the tool was called rumint.
–
John DetersSep 8 '12 at 14:51

These values are used to create two DES keys, one from each 7-byte half

Each of the two keys is used to DES-encrypt the constant ASCII string “KGS!@#$%”, resulting in two 8-byte ciphertext values.

These two ciphertext values are concatenated to form a 16-byte value, which is the LM hash

Because you now know the ciphertext of these facts you can now very easily break the ciphertext into two ciphertext's which you know is upper case resulting in a limited set of characters the password could possibly be.

A correct example: AES encryption

Known algorithm

Scales with technology. Increase key size when in need of more cryptographic oomph

A common weakness in many systems is to use a password or passphrase, or a hash of a password or passphrase, as the encryption/decryption key. The problem is that this tends to be highly susceptible to offline keysearch attacks. Most users choose passwords that do not have sufficient entropy to resist such attacks.

The best fix is to use a truly random encryption/decryption key, not one deterministically generated from a password/passphrase.

However, if you must use one based upon a password/passphrase, use an appropriate scheme to slow down exhaustive keysearch. I recommend PBKDF2, which uses iterative hashing (along the lines of H(H(H(....H(password)...)))) to slow down dictionary search. Arrange to use sufficiently many iterations to cause this process to take, say, 100ms on the user's machine to generate the key.

As a beginner, I admit I've done this. If the key is random and therefore impossible to remember, are you recommending that keys be stored somewhere in a physical form? That's the only way I can see to implement a system with purely random keys.
–
Adam CrossNov 23 '12 at 20:09

@AdamCross, "If the key is random and therefore impossible to remember, are you recommending that keys be stored somewhere in a physical form?" - Well, stored somewhere, it could be either in electronic or physical form. Non-electronic form doesn't necessarily have to be privileged over electronic form in all situations. To give an example... you can use SSL to connect to a web site securely. The SSL session key you use is not stored in non-electronic form anywhere, and is not derived from a passphrase.
–
D.W.Nov 23 '12 at 22:28

haha to clarify, by "physical form" I mean a place other than me head---but that didn't really make sense since my head is physical too.
–
Adam CrossNov 24 '12 at 9:41

In a cryptographic protocol: Make every authenticated message recognisable: no two messages should look the same

A generalisation/variant of:

Be careful when concatenating multiple strings, before hashing.

Don't reuse keys.

Don't reuse nonces.

During a run of cryptographic protocol many messages that cannot be counterfeited without a secret (key or nonce) can be exchanged. These messages can be verified by the received because he knows some public (signature) key, or because only him and the sender know some symmetric key, or nonce. This makes sure that these messages have not been modified.

But this does not make sure that these messages have been emitted during the same run of the protocol: an adversary might have captured these messages previously, or during a concurrent run of the protocol. An adversary may start many concurrent runs of a cryptographic protocol to capture valid messages and reuse them unmodified.

By cleverly replaying messages, it might be possible to attack a protocol without compromising any primary key, without attacking any RNG, any cypher, etc.

By making every authenticated message of the protocol obviously distinct for the receiver, opportunities to replay unmodified messages are reduced (not eliminated).

Actually, a nonce does not need to be a secret, it just needs to be used only once (within some time span, e.g. the validity of the corresponding secret key).
–
Paŭlo EbermannSep 28 '11 at 15:09

@PaŭloEbermann Many uses of a nonce do not require secrecy, but some protocol formalisms call the secret used to authenticate messages a "nonce" rather than a key, because it is not used as an encryption key.
–
curiousguySep 28 '11 at 22:39

In network communications, a common mistake is to use the same key for communication in the A->B direction as for the B->A direction. This is a bad idea, because it often enables replay attacks that replay something A sent to B, back to A.

The safest approach is to negotiate two independent keys, one for each direction. Alternatively, you can negotiate a single key K, then use K1 = AES(K,00..0) for one direction and K2 = AES(K,11..1) for the other direction.

Or you could have a SSC, a secure session counter, increased for each encryption (within half duplex commmunication). I've even seen an example where the last block of ciphertext of the other party was used as the IV for the next block, but that might lead to some peculiar attacks.
–
Maarten BodewesMar 2 '12 at 20:57

@owlstead, Yes, using the last block of ciphertext as the IV for the next block, together with CBC mode, led to the BEAST attack on SSL. P.S. A SSC could work to separate the two channels but you'd have to be careful with it. You'd have to increment it for both sending and receiving (using two SSCs, one for each direction, would defeat the purpose). Also the SSC will require both sides to be synchronized and will not tolerate packet drops, which may be problematic in some settings. It may be easier to just use two independent keys.
–
D.W.Mar 3 '12 at 0:17

Interesting, I know of some memory cards that use that scheme with the last part of the cipher block as the IV for the next. I'll look into it. Thanks D.W, guess my hunch was right about that.
–
Maarten BodewesMar 3 '12 at 0:42

This also opens the door for a Two time pad attack, that bit Microsoft PPTP. The first version of PPTP used the same key in the client and the server
–
LamonteCristoMay 20 '13 at 15:43

For symmetric-key cryptography, I'd recommend at least a 80-bit key, and if possible, a 128-bit key is a good idea. Don't use 40-bit crypto; it is insecure and easily broken by amateurs, simply by exhaustively trying every possible key. Don't use 56-bit DES; it is not trivial to break, but it is within the reach of dedicated attackers to break DES. A 128-bit algorithm, like AES, is not appreciably slower than 40-bit crypto, so you have no excuse for using crummy crypto.

For public-key cryptography, key length recommendations are dependent upon the algorithm and the level of security required. Also, increasing the key size does harm performance, so massive overkill is not economical; thus, this requires a little more thought than selection of symmetric-key key sizes. For RSA, El Gamal, or Diffie-Hellman, I'd recommend that the key be at least 1024 bits, as an absolute minimum; however, 1024-bit keys are on the edge of what might become crackable in the near term and are generally not recommended for modern use, so if at all possible, I would recommend 1536- or even 2048-bit keys. For elliptic-curve cryptography, 160-bit keys appear adequate, and 224-bit keys are better. You can also refer to published guidelines establishing rough equivalences between symmetric- and public-key key sizes.

Only sentence I disagree with: "This is a less common error these days"... Still one of the most common crypto errors I see, after No encryption and Rolling your own crypto.
–
AviD♦Feb 20 '11 at 8:19

@AviD, @nealmcb, thanks for the feedback. I've edited to reflect @AviD's comment. Note that I've made this a community wiki, so feel free to edit it to improve the recommendations and correct any errors.
–
D.W.Feb 20 '11 at 9:29

Use the correct mode

Equivalently, don't rely on library default settings to be secure. Specifically, many libraries which implement AES implement the algorithm described in FIPS 197, which is so called ECB (Electronic Code Book) mode, which is essentially a straightforward mapping of:

AES(plaintext [32]byte, key [32]byte) -> ciphertext [32]byte

is very insecure. The reasoning is simple, while the number of possible keys in the keyspace is quite large, the weak link here is the amount of entropy in the message. As always, xkcd.com describes is better than I http://xkcd.com/257/

It's very important to use something like CBC (Cipher Block Chaining) which basically makes ciphertext[i] a mapping:

ciphertext[i] = SomeFunction(ciphertext[i-1], message[i], key)

Just to point out a few language libraries where this sort of mistake is easy to make: http://golang.org/pkg/crypto/aes/ provides an AES implementation which, if used naively, would result in ECB mode.

The pycrypto library defaults to ECB mode when creating a new AES object.

OpenSSL, does this right. Every AES call is explicit about the mode of operation. Really the safest thing IMO is to just try not to do low level crypto like this yourself. If you're forced to, proceed as if you're walking on broken glass (carefully), and try to make sure your users are justified in placing their trust in you to safeguard their data.

The more widely you share a cryptographic key, the less likely you'll be able to keep it secret. Some deployed systems have re-used the same symmetric key onto every device on the system. The problem with this is that sooner or later, someone will extract the key from a single device, and then they'll be able to attack all the other devices. So, don't do that.

See also "Symmetric Encryption Don't #6: Don't share a single key across many devices" in this blog article. Credits to Matthew Green.

A one-time pad is not a one-time pad if the key is stretched by an algorithm

The identifier "one-time pad" (also known as a Vernam cipher) is frequently misapplied to various cryptographic solutions in an attempt to claim unbreakable security. But by definition, a Vernam cipher is secure if and only if all three of these conditions are met:

The key material is truly unpredictable; AND

The key material is the same length as the plaintext; AND

The key material is never reused.

Any violation of those conditions means it is no longer a one-time pad cipher.

The common mistake made is that a short key is stretched with an algorithm. This action violates the unpredictability rule (never mind the key length rule.) Once this is done, the one-time pad is mathematically transformed into the key-stretching algorithm. Combining the short key with random bytes only alters the search space needed to brute force the key-stretching algorithm. Similarly, using "randomly generated" bytes turns the random number generator algorithm into the security algorithm.

Here is a simple example. I have a message that I will encrypt using a "one-time pad" that uses a cryptographically secure function as a key generator. I chose a secret key, then added a random number to it to ensure it will not be reused. As I'm not reusing the key, there is no way to attack the ciphertext by subtracting one message from another.

The key material was securely generated using SHA-1 to hash my secret password (plus random) in order to stretch it. But any attacker who knows the stretching algorithm* used is SHA-1 can attack it by trying various inputs into SHA-1 and XORing the output with the ciphertext. Guessing the "OTP" key is now no harder than guessing the combined inputs to the cryptographic algorithm. This property holds true regardless of which base cryptographic algorithm is chosen, what measures of complexity it holds, or how it is implemented or seeded.

You may have a very good key-stretching algorithm. You may also have a very secure random number generator. However, your algorithm is by definition not a one-time pad, and thus does not have the unbreakable property of a one-time pad.

* Applying Kerckhoff's principle means that you must assume the attacker can always determine the algorithms used.

Would you edit "As I'm not reusing the key, there is no way to attack the ciphertext by subtracting one message from another." and add text saying that other attacks are possible? EXAMPLE: a two time pad, bad protocol, or other bias (PPTP, WEP, RC4 respectively). An unknowledgeable layman may misread what you wrote and think that OTP offers "perfect secrecy" in another sense of the word. Also, since you're broaching this topic some coverage of what a valid PNG/PRG key stretcher is would be helpful.
–
LamonteCristoMay 20 '13 at 13:49

Note: There is no message authentication in an OTP. Modifications to an OTP will be undetected.
–
LamonteCristoMay 20 '13 at 16:58

Note: A secure PRG is similar to a OTP. It is one that has all efficient statistical tests with a negligible result, and that it is impossible for a PRG to satisfy every theoretical statistical test. This "relaxing" of security is required for efficiency since "Perfect Secrecy" requires a secure transmission of an OTP big enough to match the the size of the message. EXAMPLE: All OTP transmissions require that the secret be transmitted securely (which is undefined how). It is more efficient to use that secure method to send the data in the first place.
–
LamonteCristoMay 20 '13 at 17:10

It's the "similarity" which leads people to make the outlandish claims of unbreakability, and it's that "efficiency" that breaks the unpredictability of the Vernam cypher. Nobody said key generation, key management, or key distribution with an OTP is easy or practical - it's none of the above. It's so hard that people still use other cyphers, despite the promise of mathematically perfect secrecy. No "key stretching" can alter that truth.
–
John DetersMay 24 '13 at 16:54

A "secure PRNG" could be used to generate the bits, but if it's truly secure, you still have all the distribution problems because you cannot duplicate their generation on the recipient's computer - if you could, that state would be the key, not the bits.
–
John DetersMay 24 '13 at 16:56

Many standards exist in cryptography, and sometimes you have to use them. But don't assume that the people writing the standards adequately understood the cryptography they needed. For example, EAX got reworked in a networking standard. EAX has a proof of security. The reworked version did not.

MD5 is a standard. It is now broken. Chip and PIN has been broken repeatedly many times, thanks to an abundance of dangerous features. GPG still supports DSA keys that are too short for comfort. SSL has options that should not be used, and requires care to avoid them.

What can be done about this? Being careful, understanding the known risks, and keeping up with the research into new ones.

MD5 is a standard, true. But it has been replaced with a more current standard, SHA. For most scenarios, standards SHOULD be followed for many reasons. Interoperability being a very large factor.
–
Terry ChiaSep 22 '12 at 16:05

This is a very misleading statement. The wording implies the reader should "trust NON-standards", which is clearly not true. Most security standards come into existence only after extensive real-world field testing. That testing is far more thorough than any single organization can generate to "prove" their non-standard system is secure.
–
John DetersMay 24 '13 at 17:02

Suppose two files are saved using a stream cipher / OTP. If the file is resaved after a minor edit, an attacker can see that only certain bits were changed and infer information about the document. (Imagine changing the salutation "Dear Bob" to "Dear Alice").

Example 2

There is no integrity in the output: an attacker can modify the ciphertext and modify the contents of the data by simply XORing the data.

Take away: Modifications to ciphertext are undetected and have predictable impact on the plaintext.

Solution

Use a Block cipher for these situations that includes message integrity checks

A OTP applied twice means the data encrypted with "perfect secrecy" will be decrypted and in the clear. This happens because the data is XOR'ed twice.

Example

Assume an OTP / or stream with the same key is being reused.

An attacker collects a lot of data sent from a client to a server, and XORs a set of two packets together until the two packets decrypt each other (or subset therein).

ASCII encoding has enough redundancy that means that given enough ciphertext, the original messages could be decoded (along with the secret OTP key).

Real world examples

Project Verona (1941-46) for an example of an OTP used by the Russians and was subsequently decrypted by US intelligence agency

Microsoft's PPTPv1 both the client and the server encrypt data using the same key.

WEP reuses the same key once 2^24 packets are sent, or if a NIC card is reset. The first issue is due to the IV being 24 bits long, resulting that after 16 Million frames are transmitted a two time pad is created. The second issue occurs in hardware implementations where after a power cycle, the IV resets to zero, resulting in a two time pad. This issue is easy to see since the IV is sent in the clear.

Recommendations

A new key should be created for every session (e.g. TLS).

The client should use one OTP (or stream cipher w/PRG) with the server, and the server should use a different key when encrypting data to the client

Rather than generate many many keys, It's possible to expand a single key into a long stream using a PRG (assuming you trust the PRG) and use each segment of that expansion as a key.

Know that not all PRGs are made to work in increment mode, and random input may be required. (RC4 has this issue in increment mode)

Use modern stream processors that work appropriately in Hardware or Software

Not all stream ciphers are designed to be implemented in hardware or software. Linear feedback shift register (LFSR) is an example of a widely deployed hardware cipher that is easily broken.

LFSR is used in:

DVD encryption (also known as CSS) 2 LFSR

GSM encryption (A5/1.2) 3 LSFR

Bluetooth (E0): 4 LFSR

The hardware for the above is widely deployed and therefore hard to update, or bring up to modern standards. All of the above are badly broken and should not be trusted for secure communications.

Attack:

Since the key is broken into two sections during encryption (17 bits, and 25bits) and those bits are used to encrypt the same cipher text, it's possible to use knowledge of the MPEG format, and bruteforce a 17bit key to extrapolate what the 25bit key is.

This is hardly new, but FOSS is easy to find that demonstrates this issue.

Solution:

The eStream project (in 2008) qualified 5 stream ciphers that should be used. A notable difference is that instead of using a Key with an IV, the ciphers use a Key, a nonce, and a counter. Salsa20 operates this way and is designed to be used in both hardware and software easily. Specifically, it's included in the x86 SSE2 instruction set.

Aside

The modern ciphers are not only more secure, but they are faster as well:

A MAC is a hash code that ensures the message integrity (no modifications, etc) of a given plain text. Many implementations and published standards fail to protect a MAC from an attacker that appends additional data to the MAC.

The solution for this is for the MAC implementation to use a second (different) key and re-encrypt the final output.

ECBC and NMAC are examples of ciphers that correctly prevent the message extension attack.