Lover of ideas

Interesting design problem with serialization and deserialization

I have been working on a serialization framework I'm happy with for Python.
I want to be able to describe CAKE
protocol messages clearly and succinctly. This will make it easier to
tweak the messages without having to rip apart difficult to understand code.
It will also make it easier to understand if I drop the project again and
then come back to it years later, or if (by some miracle) someone else
decides to help me with it.

Here is what I've come up with as the interface, along with one
implementation fo that interface for a simple type:

There is also a CompoundNumbered type for representing
tuples. This allows you to represent structured messages with multiple
fields. Here is example of how you might
represent CAKE new
session messages:

There is a problem though. The signature and header HMAC are supposed to be
encrypted, but the deserializer can't know the key to use until it's
decrypted the encryption header. This means that later parts of the
deserialization process need to know about things from previous parts.

I have a way for the deserialization process to save state. This is used so
that if deserialization throws a NotEnoughDataError because not
enough data is available, the exception may have a memo field.
This memo field can then be passed in again to resume close to
where deserialization stopped. (Though now I'm sort of wondering if I
shouldn't do something generator based instead...)

But this mechanism does not allow state to be passed forward from a previous
deserializer to a new one. And this applies the other way around too. When
serializing there is stuff that's not really a part of the data being
serialized (like the current HMAC or encryption state) that needs to be known
by serializer in order to serialize properly.

I'm thinking of adding an optional context parameter to the
serialization and deserialization functions that's just an empty dictionary
into which this sort of state can be stuffed. But this seems really messy.
Can anybody think of any better ways to do this that are fairly general?

The CAKE protocol looks interesting but it mixes two concepts: structured data encoding, and security encapsulation of message passing.

IMHO, if you were enforcing and reusing off the shelf encoding like thrift or protobuf, it would make the design and code much simpler to parse.

The following will probably look naive as I haven't looked at CAKE implementation code, maybe I haven't scoped the whole problem.

Once you've done that, it would be merely:

raw_data = pipe.read()
# unpack_header is a thrift/protobuf raw reader.
hdr = unpack_header(raw_data)
# hdr.firsthmac and hdr.messages are still encrypted at this point
# hdr.messages is a list of message as defined at http://www.cakem.net/v2/sessions.html#repeated
# Grab valid key, verify signature. Throws on invalid signature, etc.
process(hdr)
# hdr.firsthmac is now decrypted and verified
if hdr.message_type == 1: # session continuation
for message in hdr.messages:
# unpack_data is whatever decoding the user wants to use.
# in practice, it's probably better to not even have this function here and
# just yield the raw buffer.
data = unpack_data(decrypt_and_verify(hdr.key, message.padding))
yield data

Both thrift and protobuf handle a lots of the problems for you like: futureproofing the protocol message format, efficient encoding & decoding of native types line int&string, multi-language support, golden message definition in a single file, etc.