Saving files correctly

Recently, someone asked whether it was OK to use the ANSI standard library's fread and fwrite functions to write stuff to disk. The question was in the context of a Cocoa Objective C application, but I'll keep it a bit more general because the problem is not unique to Cocoa. The answer is:

It depends.

If you take account of some differences, fread() or any other read/write method that uses raw bytes is OK. However, most object-oriented frameworks and even procedural libraries provide more high-level file access that serializes whole object graphs into streams of data.

If you're saving objects in an object-oriented language, they are effectively structs for this discussion. In many languages, the root object's instance variables are private, so you can't save those, and need to re-create the right subclass after loading an object, too. Framework classes like NSKeyedArchiver and Co. take care of that for you. So, if you're not doing something that needs to be loaded partially and needs to be random access, an archiver is usually less hassle.

Why? Well, the endian-ness (order in which the individual bytes of a multi-byte type like an int are stored) differs between some platforms, and if you save on one platform and read on the other (e.g. PowerPC Mac -> Intel Mac), your numbers get 'reversed' and you get nonsense numbers.

The usual solution is to either save everything in one computer's endian-ness and have all others convert, or to store a byte-order marker at the start of the file (e.g. the number 1 as a short, which is 0x0100 on Intel, and 0x0001 on PPC), and then to convert if the number is not 1.

Of course, if you save a whole struct, padding bytes that get inserted into structs to force better performance by aligning fields on their native boundaries can screw you over, because those can differ depending on CPU. I think on OS X most of the types are the same, but if you want it to be possible.

Also, some data types are not defined to be a certain, fixed size. E.g. in C and its descendents, 'int' can be anything from 2 bytes to 8 bytes (2 bytes was common on 680x0 Macs, 4 bytes is common on 32-bit Macs, 8 bytes is what they are on 64-bit Macs when compiling a 64-bit application). This means the same code always gets the native, 'fastest' data type to do its work with, but it also means saving a struct to disk will not be cross-platform safe.

So, your saving code will have to take care of bridging those differences: I generally don't save whole structs, and rather each field separately, as you'd do with NSArchiver. While doing that, you can also perform endian-swapping as needed, expand smaller ints to bigger ones, all transparently using preprocessor macros or other compile-time facilities if you design things right.

Ahruman writes:The 64-bit Mac architectures do not use 64-bit ints, although 64-bit Windows does. While we’re type-lawyering, “byte” doesn’t necessarily mean 8 bits, either in a general context or in the special definition used for C. :-)

Blake C. writes:Regarding 32 to 64-bit data sizes on OS X only- the change is simply that longs and pointers got bumped from 4 to 8 bytes. Legacy code that uses ints to represent pointers will break in 64-bit mode, as will code that assumes void* == 4 bytes, among other obvious issues.