Preface

Introduction

fastBinaryJSON is based on my fastJSON article (http://www.codeproject.com/Articles/159450/fastJSON) and code which is a polymorphic object serializer. The main purpose for fastBinaryJSON is speed in serializing and deserializing data for the use in data transfer and storage to disk. It was created for my upcoming RaptorDB - Document Database engine, for performance.

Features

fastBinaryJSON has the following feature list:

Based on fastJSON code base (very fast, polymorphic)

Supports : HashTables, Dictionary, Generic Lists, Datasets, ...

Typically 2-10% faster on serialize, 17%+ faster on deserialize.

Why?

Why another serializer you may ask, why not just use fastJSON? The answer to this is simple : performance. JSON while a great format has the following problem:

JSON is a text format, so you loose type information on serializing which makes deserializing the data again time consuming.

Why not BSON?

Looking at the specifications on the above site, you feel overwhelmed as it is hard to follow.

You feel that the specs have evolved over time and a lot of the coding parts have been deprecated.

BSON encodes lengths into the stream which inflate the data, this might be fine for the use case the authors envisioned, but for data transfer and storage it just makes things larger than they need to be.

Because of the length prefixes, the encoding of the data object must be done in two passes, once to output the data, and a second time to set the length prefixes.

I initially started off by doing a BSON conversion on fastJSON but it got too complicated, so it was scrapped.

How is data encoded in fastBinaryJSON?

JSON is an extremely simple format, so fastBinaryJSON takes that simplicity and add the needed parts to do binary serialization. fastBinaryJSON follows the same rules as the JSON specification (http://json.org) with the following table showing how data is encoded:

As you can see from the above all the encoding rules are the same as JSON and primitive data types have been given 1 byte tokens for encoding data. So the general format is :

TOKEN, { DATA } : where DATA can be 0or more bytes

Strings can be encoded in 2 ways, as UTF8 or Unicode, where UTF8 is more space efficient and Unicode is faster.

String keys or property names are encoded as a special UTF8 stream which is limited to 255 bytes in length to save space (you should not have a problem with this as most property names are short in length).

Performance tests

To get a sense of the performance differences in fastBinaryJSON against fastJSON the following tests were performed, times are in milliseconds, each test was done on 1000 objects and repeated 5 times, the AVG column is the average of the test excluding the first which is skewed by initialization times:

As you can see in the DIFF column which is [ fastJSON / fastBinaryJSON ] the serializer performs at least 2% faster and the deserializer at least 17% faster, with the greatest difference being with DataSet types which are a lot of rows of data.

Now to do this fastBinaryJSON is using the FormatterServices.GetUninitializedObject(type) in the framework which essentially just allocates a memory region for your type and gives it to you as an object by passing all initializations including the constructor. While this is really fast, it has the unfortunate side effect of ignoring all class initialization like default values for properties etc. so you should be aware of this if you are restoring partial data to an object (if all the data is in json and matches the class structure then you are fine).

To control this you can set the ParametricConstructorOverride to true in the BJSONParameters.

Appendix v1.4.0 - Circular References & Breaking changes

As of this version I fixed a design flaw since the start which was bugging me, namely the removal of the BJSON.Instance singleton. This means you type less to use the library which is always a good thing, the bad thing is that you need to do a find replace in your code.

Also I found a really simple and fast way to support circular reference object structures. So a complex structure like the following will serialize and deserialize properly ( the unit test is CircularReferences()):

Share

About the Author

Mehdi first started programming when he was 8 on BBC+128k machine in 6512 processor language, after various hardware and software changes he eventually came across .net and c# which he has been using since v1.0.
He is formally educated as a system analyst Industrial engineer, but his programming passion continues.

* Mehdi is the 5th person to get 6 out of 7 Platinums on CodeProject (13th Jan'12)

A programmer walks into a bar and asks the bartender for 1.00000000000003123939 root beers. Bartender says, I'll have to charge you extra, that's a root beer float. Programmer says, better make it a double then.

Thanks, Mehdi, I am trying out that code, and left the author of the Tip/Trick what I hope is some constructive feedback [^].

It was interesting to note that taking the file-output by MiniLZO, and then (from the desktop) creating a zipped file from that reduced the file size another 500k, or so.

Next-up: trying the GZip facility in .NET.

yours, Bill

“Human beings do not live in the objective world alone, nor alone in the world of social activity as ordinarily understood, but are very much at the mercy of the particular language which has become the medium of expression for their society. It is quite an illusion to imagine that one adjusts to reality essentially without the use of language and that language is merely an incidental means of solving specific problems of communication or reflection." Edward Sapir, 1929

MiniLZO was designed to be really fast it does not compress near the zip method, for even better compression try 7-zip.

A programmer walks into a bar and asks the bartender for 1.00000000000003123939 root beers. Bartender says, I'll have to charge you extra, that's a root beer float. Programmer says, better make it a double then.

By experiment, in serializing a complex object (a collection, of collections, of an object with a variety of fields): using a test case where the collection contains 1000 collections, each of which contains a collection of 100 objects:

The setting of UseUnicodeStrings will have an effect on the size of the Byte[] returned by serializing:

The other BJSON.Instance.Parameters settings I've tried altering seem to have little impact on the size of the Byte[].

Writing the fastBinaryJSON serialized Byte[] to a file named 'filePath using:

using System.IO;
File.WriteAllBytes(filePath, BJsonBytes);

Results in a file about 18.4 megabytes in size.

Look forward to seeing some comments/documentation from other users of Mehdi's excellent work, here.

yours, Bill

“Human beings do not live in the objective world alone, nor alone in the world of social activity as ordinarily understood, but are very much at the mercy of the particular language which has become the medium of expression for their society. It is quite an illusion to imagine that one adjusts to reality essentially without the use of language and that language is merely an incidental means of solving specific problems of communication or reflection." Edward Sapir, 1929

Unicode uses 2 bytes per character but UTF8 mostly uses 1 byte if it is ASCII and 2 bytes for other languages.

For a really fast compressor check the minilzo.cs in RaptorDB, you can certainly use the System.Compression classes or you can go to 7zip wrappers for optimal size.

A programmer walks into a bar and asks the bartender for 1.00000000000003123939 root beers. Bartender says, I'll have to charge you extra, that's a root beer float. Programmer says, better make it a double then.

“Human beings do not live in the objective world alone, nor alone in the world of social activity as ordinarily understood, but are very much at the mercy of the particular language which has become the medium of expression for their society. It is quite an illusion to imagine that one adjusts to reality essentially without the use of language and that language is merely an incidental means of solving specific problems of communication or reflection." Edward Sapir, 1929

For my personal needs, I added an extension system to fastBinaryJSON in order to serialize more types.
Some types do not have a no-param constructor and therefore can not be serialized by fastBinaryJSON. But they have attributes that can help us.

. [TypeConverter] lets transcode the object to some types which can be serializable.
. [ValueSerializer] lets transcode from and to a string.
. [Serializable] (and Iserializable) allows the use of BinaryFormatter.

Some of the most useful types have these attributes:
. System.Uri ([Serializable]),
. System.Windows.Media.FontFamily ([ValueSerializer]),
. System.Windows.Input.Cursor ([TypeConverter]),
etc..

I encapsulate (substitute) each problematic object in classes, each class manages one of these attributes, so the modification of fastBinaryJSON is minimal, most of the work resides in new classes.
These classes inherit from an interface: IContainer.

Additionaly, I modified fastBinaryJSON by letting it detect the private constructors when the object class is private. The sum of these modifications let me serialize BitmapFrameDecode, for example.

Thank you, I think I will do it (except if you want to add my code to your project). The problem is to stay synchronized with your future releases.

I have a suggestion: add a filter option to fastBinaryJSON, in order to intercept the objects at the right moment during the serialization and the deserialization. That would avoid to modify the whole structure of the objects before the serialization and after the deserialization, in the case I do not modify fastBinaryJSON.

If Mehdi does not adopt the ideas you are proposing here into a future version of fastBinaryJSON, I hope you might consider writing a Tip/Trick or article here on CP, showing how you added an ability to serialize a Type currently not able to be handled by fastBinaryJSON.

While I can see your proposed definitions of SerializeFilterDelegate, and DeserializeFilterDelegate, how you are defining the instances SerializeFilter, and DeserializeFilter, used in your sample code modifications, is ... well ... over my head

thanks, Bill

“Human beings do not live in the objective world alone, nor alone in the world of social activity as ordinarily understood, but are very much at the mercy of the particular language which has become the medium of expression for their society. It is quite an illusion to imagine that one adjusts to reality essentially without the use of language and that language is merely an incidental means of solving specific problems of communication or reflection." Edward Sapir, 1929

“Human beings do not live in the objective world alone, nor alone in the world of social activity as ordinarily understood, but are very much at the mercy of the particular language which has become the medium of expression for their society. It is quite an illusion to imagine that one adjusts to reality essentially without the use of language and that language is merely an incidental means of solving specific problems of communication or reflection." Edward Sapir, 1929