Monday, January 29, 2007

Why Not Binary Blobs?

I am working on the data model for WorldEditor, the X-Plane graphical scenery editor. At risk of "life blogging" the development, the design decisions for WED illustrate some design ideas.

WED uses C++ objects to represent the user's data in memory (the "internal data model"). I'll comment at another time on why I made this decision. On disk, however, WED uses an SQLite database file. That's another blog post too.

So one must ask the question, why do we need an on-disk data-model at all? Why not just dump out the C++ object contents to a file?

One might say "because you can't write STL classes to disk verbatim due to their internal pointers and private structure." But...WED uses an object-based undo system that requires each object to know how to serialize itself to a buffer...this means that we've already written serialization code for all of those STL structures.

It would make development faster to just reuse the object serialization code, but the result would be a file format that is a side-effect of the implementation code. This isn't good if:

You want to edit the data from another application without having code interdendencies or

You want to refactor the code (which would cause object layout to change) or

You want to read a subset of the data. (The in-memory structure is, well, in-memory, so it assumes you have access to everything.

The problem is that using object serialization code is designing a file format - but without doing any of the usual work you might do in designing a file format.

In the short term you save time on writing file I/O code, but as soon as you change the object format you must write new code to read the old file format, so you pay the "cost" of that code eventually -- but you must write this code against a file format that wasn't really "designed" at all.

In particular with WED, we want the file format to be stable and low-change over a long period of time, because the kind of data that might be in a WED file can be useful over a relatively long lifetime.

Given this, I am writing an explicit file format up-front rather than use the object serialization mechanism.