Target hardware is a rather low-powered MCU (ARM Cortex-M3 @72MHz, with just about 64KB SRAM and 256KB flash), so walking the thin line here. My board does have ethernet, and I will eventually get lwIP (lightweight TCP/IP FOSS suite) running on it (currently struggling). However, I also need some kind of super light-weight alternative to SSL/TLS. I am aware of the multiple GPL'd SSL/TLS implementations for such MCU's, but their footprint is still fairly significant. While they do fit-in, given everything else, don't leave much room for others.

My traffic is not HTTP, so I don't have to worry about HTTPS, and my client/server communication can be completely proprietary, so non-standard solution is okay. Looking for suggestions on what might be the minimalistic yet robust (well a weak security is worthless), alternative that helps me --

I won't be able to optimize library at ARMv7 assembly level, and thus bank entirely on my programming skills and the GNU-ARM compiler's optimization. Given above, any pointers of what might be the best options ?

3 Answers
3

It is possible to implement a SSL/TLS client (or server) in about 21 kB of ARM code (thumb), requiring less than 20 kB of RAM when running(*). I know it can be done because I did it (sorry, not open source). Most of the complexity of TLS comes from its support of many kinds of cryptographic algorithms, which are negotiated during the initial handshake; if you concentrate on only one set of cryptographic algorithms, then you can strip the code down to something which is quite small. I recommend using TLS 1.2 with the TLS_RSA_WITH_AES_128_CBC_SHA256 cipher suite: for that one, you will only need implementations for RSA, AES and SHA-256 (for TLS 1.1 and previous, you would also need implementations for both MD5 and SHA-1, which is not hard but will spend a few extra kBytes of code). Also, you can make it synchronous (in plain TLS, client and server may speak simultaneously, but nothing forces them to do so) and omit the "handshake renegotiation" part (client and server perform an initial handshake, but they can redo it later on during the connection).

The trickiest part in the protocol implementation is about the certificates. The server and the client authenticate each other by using their respective private keys -- with RSA, the server performs a RSA decryption, while the client computes a RSA signature. This provides authentication as long as client and server known each other public keys; therefore, they send their public keys to each other, wrapped in certificates which are signed blobs. A certificate must be validated before usage, i.e. its signature verified with regards to an a priori known public key (often called "root CA" or "trust anchor"). The client cannot blindly use the public key that the server just sent, because it would allow man-in-the-middle attacks.

X.509 certificate parsing and validation is a bit complex (in my implementation, it was 6 kB of code, out of the 21 kB). Depending on your setup, you may have lighter options; for instance, if you can hardcode the server public key in the client, then the client can simply use that key and throw away the server certificate, which is "just a blob": no need for parsing, no certification, very robust protocol. You could also define your own "certificate" format. Another possibility is to use SRP, which is a key exchange mechanism where both parties authenticate each other with regards to the knowledge of a shared secret value (the magic of SRP is that it is robust even if the shared secret has relatively low entropy, e.g. is a password); use TLS_SRP_SHA_WITH_AES_128_CBC_SHA.

The point here is that even with a custom protocol, you will not get something really lighter than a stripped-down TLS, at least if you want to keep it robust. And designing a robust protocol is not easy at all; TLS got to the point of being considered as adequately secure through years of blood and tears. So it is really better to reuse TLS than inventing your own protocol. Also, this makes the code much easier to test (you can interoperate with existing SSL/TLS implementations).

(*) Out of the 20 kB of RAM, there is a 16.5 kB buffer for incoming "records", because TLS states that records may reach that size. If you control both client and server code, you can arrange for a smaller maximum record size, thus saving on the RAM requirements. Per-record overhead is not much -- less than 50 bytes on average -- so you could use 4 kB records and still have efficient communication.

Thanks a ton @thomas-pornin. Got a lot to write, so will add comments in multiple-parts.
–
mike.dinnoneApr 21 '11 at 12:48

1

@mike: My code was written in C (no assembly). Code size was after compilation (thumb mode), and symbols stripped. Architecture was an ARM7TDMI, i.e. ARMv4. The code size does not include OS or network code (my code received two callback functions for reading and writing bytes; it has been used over a serial link, and over a TCP connection). My code performs no dynamic memory allocation (it works into a single caller-allocated 20 kB structure). Performance was low (about 11 kBytes/s with a 33 MHz ARM7TDMI).
–
Thomas PorninApr 21 '11 at 13:15

1

@mike: handshake renegotiation is an additional security feature which is not used often, but which requires more support code (which my implementation did not have). It is used, for instance, to request client certificate authentication over an initial connection where the client did not, initially, send a certificate.
–
Thomas PorninApr 21 '11 at 13:18

1

@mike: My code was a specific development for a customer; it is not for sale (legally speaking, it is not mine anymore). I am talking about it to show that it is feasible.
–
Thomas PorninApr 21 '11 at 13:19

1

@mike: Running your own CA is certainly feasible, but it is not very simple, and thus can prove expensive in the long term. Lookup EJBCA for an opensource CA. SRP is a key exchange mechanism which replaces RSA and avoids the use of certificate; instead, it performs mutual authentication through a secret value shared between client and server (depending on your situation, maintaining such a secret may or may not be easy).
–
Thomas PorninApr 21 '11 at 13:21

Whatever you do I would go with an existing protocol that has been around for a while and strip away / hardcode some options to make it light, rather than invent your own protocol.

If your needs are simple enough that you could get by with preshared symmetric keys, then adding a subset of IPSEC to the TCP/IP stack seems like it might be doable without a huge code footprint. Might even be possible to maintain interoperability.

Similarly a stripped down SSL or Kerberos is probably an option, as you won't need most of the fancier authentication and key management aspects, nor most of the ciphers.

TLS still seems like the tool for the job, just pick the ciphers wisely, and then profile which of the cipher implementation works well on your platform. Also, the official implementations of crypto libraries tend to be horribly slow comparing to hacking stuff, like the algorithms from John The Ripper, so maybe you could use these. Have you tried MatrixSSL yet?

that's a terrific idea, i.e. using algorithms from JTR. However, I see that most of the commercial (dual licensed) crypto stacks claims heavy usage of uC/uP native assembly (hand) optimizations, so will have to compare it the hard way ! Thanks for the suggestion, definitely worth checking out.
–
mike.dinnoneApr 21 '11 at 3:50

1

Note that libraries which are using to secure communications must be resistant to side-channels, such as timing information. Hacking stuff doesn't care. In particular the asymmetric crypto part must be written carefully. But from what I heard, those parts are already optimized pretty well in libraries like OpenSSL.
–
CodesInChaosApr 24 '12 at 21:50