I'd like to preface this question by stating that I fully understand the dangers of writing your own encryption algorithms, and I would never, ever, use homemade encryption to secure the data of anyone except myself.

Today I was assigned a computer science semester project that brings together everything we've learned into one program. Part of the functionality of this program is that it can encrypt and decrypt Strings. We have to write these encryption methods ourselves, so we cannot use anything built into the language we are using (Java). Finally, we need to avoid anything that uses a key for encryption.

Now, after talking to some of my classmates, it seems like almost everyone is using ROT13 or another similar method. Because I'm an overachiever, and because I don't want to be like everyone else, I want to design my own method of encryption. However, I'm a bit lost in where to start. So, what basic or advanced techniques are there for encryption?

You are probably aware of this, but ROT13 is not any form of encryption. It is only barely obfuscation :-)
–
Rory Alsop♦Dec 9 '11 at 9:05

2

@Rory Technically, the Caesar cipher (of which ROT13 is a special case) is encryption. It's just pathetically easy to break. And with ROT13, the key (13) is already known.
–
JonathanDec 9 '11 at 13:18

@Jonathan - you know, you are right. But it is very poor :-)
–
Rory Alsop♦Dec 9 '11 at 13:26

I was sort-of surprised to see that nobody has mentioned "the first rule of cryptography". But then, it seems the first paragraph of the OP shows that the asker here is fairly aware of it.
–
IsziDec 11 '11 at 5:44

If you start a random number generator at a known seed and do the ROT based on a new psuedorandom number for each character, you can later decode the string by starting the random number generator by the same seed and reversing the process. The weakness is that if you try all the seeds, you can eventually find which one works. Once you have got it working you can try and make it more difficult to decrypt. The important lessons is: how long would it take decrypt it if you have the source?
–
Stuart WoodwardDec 15 '11 at 21:53

10 Answers
10

It depends what type of encryption you wish to do. Huge caveat: this answer is only about pointing you in the right theoretical direction. I highly recommend doing a lot of reading before jumping in - the more you read the more you will understand how previous ciphers were broken and not make the same mistakes.

Public key

For operating a public key system you need a trapdoor function. Unfortunately, the advice on wikipedia is pretty accurate:

Several function classes have been proposed, and it soon became obvious that trapdoor functions are harder to find than was initially thought

Trapdoor functions are pretty hard; trapdoor permutations (where the output and input sets of the functions are the same and as such the function "permutes" the input inside the set) are even harder. Roughly speaking, the prime factorisation problem and the discrete logarithm problem are two "big ones". Chances are in this field, using an existing one is going to be by far the easiest approach.

By reading - as much as you can. I dislike the standard advice "don't design your own crypto". I think people should try if they want to. But I cannot emphasize enough how hard it is to get right. As you've a limited amount of time for your project, one technique might be to use a simple example of an existing cipher, so:

For your project

As an educational exercise RC4 is very easy to implement. Once upon a time (not so long ago) this was used to protect SSL/WEP traffic - sometimes it is still used, so you'd be using a real cipher. It does have some security issues - understanding these will also help you for your general crypto-education. However, as your requirement is less absolute security and more learning, I would have thought it would be ideal.

If you're feeling quite ambitious and know your language well, AES is also not that hard to implement in ECB mode. FIPS-197 is quite readable and generally explains the algorithm in a fairly accessible way.

You are right to consider ROT13 a poor example. Even not knowing the offset of each character was 13 places, assuming you use ASCII you just try each of the 127 (or 255 for extended ASCII) offsets of your cipher-text until the right one drops out. To decrypt it is therefore pretty trivial, even without the key.

"Don't design your own crypto" doesn't mean never do it, never do it and actually think it's secure or any use and never ever use it for anything that you don't want to assume is plaintext anyway.
–
ewanm89Oct 7 '12 at 16:59

You have to avoid anything that uses a key? Personally, I can't see how you can call an algorithm "encryption" if it doesn't use a key.

You might consider writing your own implementation of Simplified DES. As the name implies, Simplified DES (or S-DES) is a greatly simplified version of DES. It uses a 10-bit key, and it's simple enough to work out with pencil and paper.

I don't want to spoil your fun, but you want to think about the following:

What, intrinsically, is encryption, anyway? What are the properties of things that encrypt and decrypt and why do we do it as a society? You want to think about the characteristics as well as the process.

What is a key? Based on your research you may want to request clarification of this point from your instructor.

Create a classification system of all families of encryption techniques. By doing this research you may find an interesting answer or two.

This is a semester-based project, so it's not something you can (or should) answer overnight. The code itself may only take a day or two. The real learning is in finding solutions based on the given constraints.

You should read The Handbook Of Applied Crypgoraphy. This book is also known as "The Handbook". Its free, and well written. However Chapter 2, "Mathematical Background" is quite stiff, most of these concepts are not taught at my local public university (I looked).

If you want to see a simplified version of complex "confusion" and "diffusion" William Stallings wrote an excellent Simplified DES implementation.

It's easy enough that I drew it out (and did the transpositions) on graph paper. But it will take you through all of the basic functions DES uses, and walks you through a single round of the encipher-decipher process.

For two way encryption, most algorithms uses an x or operator, comparing the binary code of a key and the binary data of the input, this might not be right for you then, as you cant use a key... however, this is how it works:

Input data: 10011101101001
Key: 123 = 1111011

The key is smaller than the input, so it needs to be repeated:

Input data: 10011101101001
Key: 123 = 11110111111011

(in Java use one variable to count in a for each or a while loop trough all the bits of the data input...) Now use the x or principal to generate the encryoted result(two way hash) loop trough each bit in the input data and compare it to the corresponding bit in the key, if identical, add 0 to the result, if not, add 1 to the result... The result will then be:

Ideally you would use a hash function like sha, md5, ripemd etc... to generate the key, then turn it in to binary... if you cant use a premade algorithm, you could make your own algorithm to generate the key to be compared... just make shure all the bits in the input are dependant upon each other to generate the result... example:

password: abcdefghi
abc= 123456789(a=1,b=2,c=3 etc...)

now loop every bit(digit) and add them together with a counter, example:
count=0
result =""
foreach digit in password do{
result = result & (digit+result[count-1])*count)
count = count+1
}

key result= 16152845669120153
Binary: 111001011000101110110101110100001110000011100010011001
(This is a very poor example thou... you should think trough a good algorithm... one where the two starting inputs combine and form the third, and then the third and fourth go together with the result of the first combining to generate the fith result...)

What you show here is the Vigenere cipher. It was what most algorithms did... in the 16th century. Science has made a few advances since. More than 150 years ago, Edgar Allan Poe had made a specialty of breaking these... by hand ! (Computers had not been invented yet). It is a bit weak...
–
Thomas PorninOct 7 '12 at 17:01

The explanation could use some work, but technically an okay answer (at least compared to rot13...). Maybe you could also point out the huge weaknesses of xor encryption. Edit: @ThomasPornin "A bit" weak indeed, but well if the rest of the class is using rot13, this is an improvement already.
–
LucOct 7 '12 at 17:01

Depending on the constraints placed on you you can actually create an extremely difficult to crack encryption reasonably easily - this encryption has practical flaws that make it fundamentally unusable in the real world, but you should stuff the ROT13, Caesar users, etc quite handily - basically you'll be creating an entropy encoding system, which nets you a one time pad

Write yourself something to raw read all the files on your disk drive - this is pretty easy, google about for a hierarchical recursive directory scan, open all the files raw/binary, and suck in their contents

As you begin streaming in each byte stream, make yourself a master file where you look for a recurrance of subsequences (i will refer to these as strings from now on, as thats what they are, they're just not text strings) in the input - you need to create an algorithm that over time, prefers the longest possible matching subsequences but can recursively section up the input into smaller strings - if you look at http://en.wikipedia.org/wiki/Huffman_coding you'll see a particular algorithm for accomplishing this but you don't need to go all that far - but implementations will probably yield code fragments that will simplify your life.

Now, to encode something, take the input string and apply the same operation, finding the longest length matching substrings in the master file and replacing the input string with the offset and length of the matching substring in the master file - note this will match any string, because at the end of the day you'll recurse down looking for single bits
One guard you will need to use is that you have to cycle through the set of all matching strings before you start reusing the same indexes - imagine a master file where you had alternating 1's and 0's and you could only match inputs at the bit level (technically impossible but bear with me) - if you received a string of 5 1's, you'd encode it as 1:1,3:1,5:1,7:1,9:1 (yes, one flaw is this encoding can become gruesomely inefficient in certain cases) (nb - if you encode bits, you'll weaken the code - extra points if you only move the offset in the message, but that's a nasty multidimensional mapping strategy outside the scope of this post)

Keep track of the count of reused indices - your goal is to have a master table big enough that this never happens - if this occurs and you were to encode only one message it's pretty certain the universe would die of heat death before the code could be cracked the more messages you encode WHERE INDICES ARE REUSED, the more your code becomes comprimised (language analysis, pattern analysis, etc)
Now here's the catch - in order to use this code with another party - you need to get them a copy of the master table - you should only do this in person, you should always keep the transfer media under your control, and you should destroy it when the transfer is complete - and if any machine that master table is on gets comprimised, your code is toast - until then, it's pretty damn tough

Check out the Crypto I class from Stanford University on coursera. It breaks down stream and block ciphers as well as public key encryption. You'd be a lot more informed if you only watched the first few lectures. Plus the course also covers vulnerabilities and methods of breaking crypto implementations.

a) Create a home-grown pseudo-random number generator (PRNG), with a large period. To get greater periods you can have multiple generators. Because your PRNG is home-grown, you must test it thoroughly to make sure it is reasonably random.

b) For each encryption generate a seed for your home-grown PRNG. This must not be generated using your home-grown PRNG! I used the mersenne twister, seeded by various things like time in microseconds & process-id.

c) XOR the output from your home-grown PRNG with the plaintext to produce the cyphertext, and append the home-grown seed used in step 2.

d) The decryption algorithm simply extracts the seed from the cyphertext, then reverses the encryption using your home-grown PRNG.

No "key" or "password" is used. The key is essentially your home-grown PRNG.

My PRNG had a large enough period that no PRNG sequence would ever get repeated/re-used within the expected lifetime of either the data or the system itself (i.e. 10+ years), and I tested this to make sure. I made sure the period was very large by having multiple PRNGs (with multiple seeds) and XORing the multiple sequences together. The very large period meant that every single call to my encryption library code was using something like a one-time pad. The only difference was that each "one-time pad" was only pseudo-random and not truly random. A big benefit for me was that there was no need for key-sharing or key management.

The security of this algorithm depends on the difficulty of predicting the home-grown PRNG sequence from the seed. This is why a home-grown PRNG must be used... if you used a "standard" PRNG, then it would be easy to guess the PRNG sequence from the seed embedded in the cyphertext.

The thing is don't use your own encryption, or the one some genius thought up last week, use the same encryption that has been around and still hasn't been cracked, your better off because that genius last week might have overlooked something and you know that the encryption that's been around has had time to prove itself.