OMM Binary Format Draft

Proposal: OMM Binary Format

Two requirements for the binary format have been identified by the members of the XG for the binary representation of an OMM

compactness

random access

The latter is not addressed by existing, generic binary representations for XML, such as EXI.
Therefore, we propose a binary standard representation for the OMM XML alongside the usual XML specification.
The binary representation standard only covers a core set of OMM data.

The binary representation is designed so that it can be easiliy generated using the Thrift framework.
This facilitates implementation of generators and parsers for the binary representation.

As an example, the binary representation for OMM_ID_BLOCK is defined in Thrift IDL as follows:

With the above definition and the following example data (SemProM ID Block, not in OMM format):

primary id type: RFID

primary id value: myotheruri_is_much_longer

secondary id 1 type: RFID

secondary id 1 value: some_rfid

secondary id 2 type: gid96

secondary id 2 value: one_more_id

this is the generated binary representation

03 00 01 03 0C 00 02 08 00 01 00 00 00 02 03 00 02 00 0B 00

03 00 01 03 0C 00 02 08 00 01 00 00 00 02 03 00 02 00 0B 00

03 00 00 00 19 6D 79 6F 74 68 65 72 75 72 69 5F 69 73 5F 6D

75 63 68 5F 6C 6F 6E 67 65 72 00 0F 00 03 00 00 00 02 08 00

01 00 00 00 02 03 00 02 00 0B 00 03 00 00 00 09 73 6F 6D 65

5F 72 66 69 64 00 08 00 01 00 00 00 03 03 00 02 00 0B 00 03

00 00 00 0B 6F 6E 65 5F 6D 6F 72 65 5F 69 64 00 00

Using Thrift (or Protocol buffer or any other binary serialization framework) greatly facilitates reference implementation and extensibility. However, the flexibility of the generated binary representation is limited to what their type system provides. Fixed length strings, e.g., are not foreseen and thus it is not possible to make the second id start at a fixed byte addrees. This can be worked around by defining the string as, e.g., 256 bytes. Then the application would have to do the mapping from the byte buffer to String.

More flexible notations, such as ASN.1, are difficult to implement and use. A completely hand-crafted serializer/deserializer has all the flexibility one could wish for, however, with the known problems of adoption, standardization etc.