From this excellent answer I learned (correct me if I am wrong) that when writing a block cipher with say key size 128 bit, one has to pad the password given (variable size) so that it becomes exactly the 128 bit that is needed. I had thought that one would just add zeros or repeat the password until the length becomes 128 bit. However, after implementing these options, I can see that the key doesn't look quite random.

I understand that one might use PBKDF2. From what I understand, one has to choose a hash function and that the SHA might be a good choice.

I would like to write my own block cipher from scratch, and having to write all the code for SHA seems a bit much for me right now.

(1) Is there a nicer simpler hash function that could semi-securely be used in PBKDF2? Or alternatively a simpler, but still nice way of padding a password that would be "easy" to do "by hand"? At least, I would like something simple so that I can finish an implementation of TEA.

[(2) Bonus question: I noticed that the hash functions SHA, MD5,... seem to produce a string that is made up of characters with ASCII codes in printable range. Is that a problem when hashing passwords that are used to encrypt plaintext?]

EDIT: I realized that I was wrong about what I had "noticed" in questions 2.

The problem when a simple padding (rather than PBKDF2) is used is not so much that the password does not look random; that is fixable easily with a hash. The issue is that testing if the password is right given known plaintext is fast, which weakens the password greatly. PBKDF2 (and better Scrypt) fix that.
–
fgrieuJun 11 '12 at 15:36

PBKDF2 is a giant progress compared to practices such as MD5(Password||Salt) → Key (where || stands for concatenation), but it is significantly inferior to what state-of-the-art allows. See the following table from the paper Stronger Key Derivation via Sequential Memory-Hard Functions defining the Scrypt Password-Based Key Derivation Function.

In the following I discuss the PBKDF2 avenue envisioned by the OP; I might propose a simpler one, illustrating the concept of Sequential Memory-Hard Key Derivation Function, in a later post under preparation.

PBKDF2 requires a Pseudo-Random Function PRF(Key,Message) → Output with variable-length key and message inputs, and fixed-size output. The PRF commonly used is HMAC. HMAC in turn requires a hash function. In order for HMAC's original design rationale to hold, that hash must use the Merkle–Damgård construction, with a round function that is a One-Way Compression Function OWCF(Block,State) → NewState with fixed-size block input and state input/outputs. The simplest construction for an OWCF from a block cipher is the Davies-Meyer construction. In that context, one uses a block cipher with block size the hash's output size, and usually key input at least twice that. In the usual PBKDF2 with HMAC-SHA-1, the OWCF is build from a block cipher with 160-bit block size (the state in the OWCF) and 512-bit key (the block in the OWCF). TEA, with its 64-bit block and 128-bit key, is not a good direct fit for that block cipher.

Further, in that use, the block cipher should be resistant to related-key attacks, and in particular must not have equivalent keys, which trivially turn into collisions; these get up the chain into hash collisions, HMAC key and message collisions, PBKDF2 equivalent passwords and equivalent salts. TEA has equivalent keys (and lesser related-key vulnerabilities). In the context, using TEA as a block cipher in the Davies-Meyer construction, it would be easy to exhibit a class of equivalent passwords and equivalent salt (by toggling bit 7 of some bytes so as to reach equivalent keys in TEA), which shows that a sizable portion of the entropy in the input is lost (1/64 for random input). This would not be a total disaster in practice, but for a recommendable scheme along the above lines, we want to strengthen the cipher w.r.t. related-key attacks, and (as lesser need) double its block and key size.

One sound scheme to construct that 128-bit block, 256-bit key cipher DTEA is a 4-rounds symmetric Feistel cipher with TEA as the round function, using at each round a 128-bit sub-key, derived as two 64-bit halves, each obtained by a CBC-MAC of DTEA's 256-bit key, using CBC-MAC with TEA keyed by an arbitrary 128-bit constant. We need 8 such constants (2 per round), and a total of 36 TEA per DTEA. This is slow, but in the context slowness that can't be optimized out is no issue. The paranoid could even use D2TEA derived from DTEA the way DTEA is derived from TEA; that would raise the difficulty of exhibiting even two equivalent passwords or salts from about $2^{64}$ DTEA (hard but feasible) to $2^{128}$ D2TEA (infeasible in the predictable future).

Thus my answer to the question is: If you want to use PBKDF2 with TEA as the core function then

use TEA to construct a wider cipher DTEA better protected from related-key attacks, using the Feistel construction with sub-keys derived by CBC-MAC, as above;

use DTEA in the Davies-Meyer construction to build an OWCF;

use that OWCF in a Merkle–Damgård construction to build a hash function;

use HMAC to turn that hash into a PRF;

use that PRF in PBKDF2;

build salt as the the concatenation of context information (user or file name..) and if feasible 256 bits of true random (else use date and time to the best accuracy available);

generate the desired key as PBKDF2(Password, Salt, c, dkLen=16), with iteration count c set as high as practical.

If you are not planning on using your code in a deployed system, just for learning/testing (which is what you are doing right?), then zero padding is fine.

A hash function takes bits as input and outputs bits. Most software will convert those bits to something human readable (hex typically). If you want to use canned software to do the hashing (or key derivation), you would want to find a function or method that outputs an array of bytes. Otherwise, you have to convert the hex output back into the bytes that they represent.

It is correct that using mere padding to turn a password into a key works, in the sense that the cryptosystem can encipher and decipher. However that's introducing a serious weakness.
–
fgrieuJun 11 '12 at 6:19

@fgrieu, agreed. For what the OP is wanting though (which appears to be learning block ciphers), padding the password is probably fine.
–
mikeazo♦Jun 11 '12 at 11:21

Thanks for the answer. Yes I am just doing this for fun. I don't actually need to encrypt anything. But I would like something that wasn't trivial and something that doesn't show obvious signs of pattern. I read more about the hash functions, and I see that I was wrong...
–
ThomasJun 11 '12 at 21:08

Using the Davies-Mayer construction to turn TEA into a hash function is close the the recipe for disaster used in the original XBOX. Problem is, there are equivalent keys in TEA, and they turn into hash collisions in the Davies-Mayer construction.
–
fgrieuJun 11 '12 at 6:17

You got me there. I didn't know this attack happened. However, from the little I can see, the only consequence (when used for compression) is that the password space may be reduced by a couple of bits. Other constructions will be better, but I don't see it as a dramatic problem in this case.
–
SquareRootOfTwentyThreeJun 11 '12 at 7:07

1

Another problem with Davies-Mayer when applied to TEA is that the blocksize is only 64 bits. So the result will be a 64 bit hash function. That is much too small for collision resistance, and approaching too small to resist preimage attacks.
–
mikeazo♦Jun 11 '12 at 11:47