Why and How Writing Crypto is Hard

From:
andrew cooke <andrew@...>

Date:
Tue, 25 Dec 2012 18:56:30 -0300

Over the last few days I wrote a simple library to encrypt data in Python.
This blog post describes my experience writing that code. I focus on the
various mistakes I made, and try to understand the underlying causes.
But first a little context. I'm aware of the phrase (exhortation? slogan?)
"Typing The Letters A-E-S Into Your Code? You’re Doing It Wrong"
http://news.ycombinator.com/item?id=639647
but I couldn't find a Python 3 library that let me encrypt a string using a
simple password.
So I decided to go ahead, write the code, and then solicit feedback. If I
had made any mistakes then perhaps someone else would correct me, and the
result would be something other people could use.
To be honest, when I started, I thought could do a pretty good job. I've
worked with security-related code several times (a JNI wrapper for OpenSSL
back in the day; more recently, for example, making OpenSSH talk to hardware
key stores) and I thought a fair amount of crypto knowledge had "rubbed off" -
I can explain what CTR mode is, for example, and why you should never use the
same key+IV twice. And also, I am not so dumb; how hard can this stuff be?
Even so, I searched around for some guidance on best practice. And I was
lucky enough to stumble across
http://www.daemonology.net/blog/2009-06-11-cryptographic-right-answers.html
which I decided to follow.
My first attempt was broken (although I eventually found the mistake myself).
It had exactly the vulnerability I said I could explain above: messages with
the same key used the same counter sequence. This was because the "iv"
parameter in the pycrypto Cipher API is ignored in CTR mode. Instead, you
need to provide the data to the Counter object.
I don't know if I am being muddle-headed in thinking of the initial counter
value as an IV, but I was a little annoyed with pycrypto. Couldn't it throw
an error if it's given an IV in CTR mode, instead of simply ignoring it? On
the other hand it doesn't seem fair to expect a library of crypto primitives
to educate users - it's intended for experts, who should know what they are
doing.
Anyway, that was my first mistake. The root cause being, I think, that crypto
APIs are complex because they provide access to powerful primitives that can
be combined in many ways, but which, at the same time, must also be efficient
(the need for efficiency affects the design of Counter, for example, which is
why the IV is ignored). A box of sharp tools.
Next, I started to worry about the API for *my* users. I couldn't really
expect them to provide a 256 bit key; this was a library for "anyone". So
it had to take something more like a password.
Unfortunately, although I knew about key derivation functions, which is what
you need to go from password to key, I thought they were used only for
storing passwords. I have no idea why I thought this, but as a consequence
I started to cobble together my own hand-rolled attempt at key strengthening.
Thankfully, as my code got more complex, I realised I must be reinventing an
already-existing technique. Once I was convinced of that, finding PBKDF2 (it
was mentioned in the link I said I would follow - although nowhere near the
paragraph on symmetric ciphers) was easy.
So mistake 2 (which I eventually avoided) was not knowing about an existing
solution to a common problem. Or rather, not knowing that it could be
applied in a more general sense than I had understood.
At this point I believed my code was pretty solid so I posted it to HN at
http://news.ycombinator.com/item?id=4962983
It took a while to get useful feedback, but when I did, it was awesome. So
awesome it identified FIVE more problems. Ouch.
1. Don't expose salt in the API.
2. Use separate keys for cipher and HMAC.
3. Avoid a possible timing attack when comparing HMACs.
4. Manage the counter in a standard (NIST) way.
5. PBKDF was using a weaker hash than expected.
The first (user gives salt) is plain embarrassing - it's just bad API design.
If I can blame anything other than incompetence, salt appeared in the original
API because it "seemed odd" to generate data and then append it to the
message. I felt that even though it is how you handle the IV (and, in fact,
the final code uses the same data for both salt and IV). So it's not a
particularly logical explanation for my mistake, but it's all I have.
The second (separate keys) was an open question - I just didn't know what
best practice was. So lack of experience there.
The third (timing attack) was a subtle implementation detail I would never
have noticed. A lack of knowledge of the current literature.
Fourth (counter management) was more damning. I already knew the
normal way to handle counters, from using CTR mode to generate a stream
of random data in another project. I thought I was being smart and improving
things by using a different approach (yes, I know that sounds like the kind
of thing a newbie would say, but I thought it *despite knowing that*).
Fifth (weaker hash) I blame partly on the pycrypto API (again) (the way that
the hash is exposed is rather obscure), but also on a lack of familiarity
with key derivation standards - I didn't know that the MAC was a likely
parameter.
So, in one simple piece of crypto code I had a total of seven errors (so far).
The sources of error were:
* Being unaware of existing solutions to common problems.
* Being unaware of existing best practices.
* Misunderstanding the complex API of a crypto toolkit.
* Bad API design.
* Ignoring existing solutions and "improving" things.
The last of these I can't do much about. In theory I should be smart enough
to not do that. I guess the lesson there is that sometimes you make even
dumber mistakes than you expect.
The rest divide nicely into two groups: experience and API design.
I was surprised how important experience was. Despite having some experience
with security-related code. Despite having a good set of guidelines on what
to do. Despite being able to search the Internet. Despite all that, I still
made mistakes that only experience could spot.
As for API design. Well, I think that just confirms how important (and hard,
and overlooked) API design is.
So, what are the conclusions? Experience and API design matter. And even
when you are aware of the kind of pitfalls that face people that write crypto
code, you can still make dumb mistakes.
Andrew
PS The current library is at https://github.com/andrewcooke/simple-crypt

I can relate to that ...

From:
Michiel Buddingh' <michiel@...>

Date:
Thu, 27 Dec 2012 07:16:34 +0100

. . . I recently wrote some cryptographic code that encrypted some
very short (10-20 byte) messages. There was a requirement that we'd
be able to decrypt any of these messages individually, without having
access to the other messages.
And so, I recycled the iv, and I didn't even bother with key
strengthening, knowing well that whoever reads this code in ten years
is going to think me an idiot. But of course, 1) I really couldn't
justify the time to do it properly 2) we were just trying to
discourage onlookers, not thwart the NSA.
What still bothers me about that situation, though, is that, for all I
know, recycling the iv is the worst compromise to make; there might be
cleverer ways to accomplish what I was trying to do.
. . . the thing is, the cryptography sector doesn't "do" trade-offs;
your security is either resilient to a government agency running a
chosen-plaintext attack on their FPGA cluster, or it's considered
embarassingly broken.
The very people who do have the capability to write high-level APIs,
to make sensible trade offs in designing algorithms and approaches to
security problems also have a, seemingly cultural, inhibition against
simplification.
--
Michiel

Re: I can relate to that ...

From:
andrew cooke <andrew@...>

Date:
Thu, 27 Dec 2012 08:55:14 -0300

Space constraints are difficult. At work they were trying to encryot the body
of SMS. I am not sure what happened in the end, but it wasn't looking good.
When it comes to "make it hard, but don't worry if it's not impossible" I feel
like there should be some kind of standard. Perhaps there is, and it is
ROT13. And maybe just suggesting that can help, because when people start to
object to ROT13 the same arguments typically apply to anything else that isn't
"proof against government".
Anyway, I just want to emphasise that I fixed all the bugs I discussed, and
simple-crypt, which is now on PyPi http://pypi.python.org/pypi/simple-crypt is
supposed to be able to "thwart the NSA". Of course, it may still contain bugs
(which is why it is (1) in beta and (2) includes a header in the encrypted
data that will allow a fixed version to be deployed and work even when people
have used a previous, buggy version, should it be needed).
Andrew

Fixing this

From:
Laurens Van Houtven <_@...>

Date:
Sun, 11 Aug 2013 10:32:51 +0200

Hi Andrew,
Excellent points, and I agree wholeheartedly.
For the library situation, I've joined some people in writing a library:
https://github.com/alex/cryptography
Right now, it's mostly just primitives, but the end goal is an API that you
simply couldn't get wrong, which sounds to me like what you wanted in the
first place.
Additionally, I agree that education is lacking. Hence, I'm busy turning my
talk from last year, Crypto 101 (http://pyvideo.org/video/1778/crypto-101)
into a book. Hopefully this will make the journey for future programmers a
little easier :)
I eludicated further in a HN comment:
https://news.ycombinator.com/item?id=6194332
HTH,
lvh