Strategy Part 1 - Pick the correct data model

The strategy is to provide behavior in F# that is similar to Python (at least, as much as possible given the only example). Since val is a keyword in F#, using it as a field name required wrapping it in 2 back-ticks, aka ``val``. Or renaming it to something like value.

The key here, in my mind, is to pick the appropriate data type to hold the tree. Some choices include:

Custom Serialization using FParsec

Finally, we have the functions for Custom Serialization using FParsec. The strategy here is to serialize nodes with values as Some("val") Child1 Child2 or as None. We will also be using a pre-order traversal of the tree, i.e. process the parent node, then the left child, and finally the right child.

Equality Tests

Let's ensure that our equality tests are working correctly, especially for CNodes because we had to override Equals and GetHashCode.

In [7]:

moduleEqualityTest=openExpectotypeDU=Model.DUNodetypeR=Model.RNodetypeC=Model.CNodelettests=testList"Equality tests"<|[foriin1..Test.numTestsdoyieldtest(sprintf"Confirm a DUNode tree is equal to itself %i"i){lettree=Test.CustomTreeGeneratorDU.createOptExpect.equaltreetree(sprintf"%O"tree)}]@[foriin1..Test.numTestsdoyieldtest(sprintf"Confirm a RNode tree is equal to itself %i"i){lettree=Test.CustomTreeGeneratorR.createOptExpect.equaltreetree(sprintf"%O"tree)}]@[foriin1..Test.numTestsdoyieldtest(sprintf"Confirm a CNode tree is equal to itself %i"i){lettree=Test.CustomTreeGeneratorC.createOptExpect.equaltreetree(sprintf"%O"tree)}]@[foriin1..Test.numTestsdoyieldtest(sprintf"Confirm 2 different DUNode trees are not equal %i"i){lettree=Test.CustomTreeGeneratorDU.createOpt|>fun(DU.Node(v,l,r))->DU.create("1",l,r)lettree2=Test.CustomTreeGeneratorDU.createOpt|>fun(DU.Node(v,l,r))->DU.create("2",l,r)Expect.notEqualtreetree2(sprintf"%O %O"treetree2)}]@[foriin1..Test.numTestsdoyieldtest(sprintf"Confirm 2 different RNode trees are not equal %i"i){lettree=Test.CustomTreeGeneratorR.createOpt|>funt->{twith``val``="1"}lettree2=Test.CustomTreeGeneratorR.createOpt|>funt->{twith``val``="2"}Expect.notEqualtreetree2(sprintf"%O %O"treetree2)}]@[foriin1..Test.numTestsdoyieldtest(sprintf"Confirm 2 different CNode trees are not equal %i"i){lettree=Test.CustomTreeGeneratorC.createOpt|>funt->C.create("1",t.left,t.right)lettree2=Test.CustomTreeGeneratorC.createOpt|>funt->C.create("2",t.left,t.right)Expect.notEqualtreetree2(sprintf"%O %O"treetree2)}]@[foriin1..Test.numTestsdoyieldtest(sprintf"Confirm 2 similar DUNode trees are not equal %i"i){lettr=Test.CustomTreeGeneratorDU.createOptlettree=tr|>fun(DU.Node(v,l,r))->DU.create("1",l,r)lettree2=tr|>fun(DU.Node(v,l,r))->DU.create("2",l,r)Expect.notEqualtreetree2(sprintf"%O %O"treetree2)}]@[foriin1..Test.numTestsdoyieldtest(sprintf"Confirm 2 similar RNode trees are not equal %i"i){lettr=Test.CustomTreeGeneratorR.createOptlettree=tr|>funt->{twith``val``="1"}lettree2=tr|>funt->{twith``val``="2"}Expect.notEqualtreetree2(sprintf"%O %O"treetree2)}]@[foriin1..Test.numTestsdoyieldtest(sprintf"Confirm 2 similar CNode trees are not equal %i"i){lettr=Test.CustomTreeGeneratorC.createOptlettree=tr|>funt->C.create("1",t.left,t.right)lettree2=tr|>funt->C.create("2",t.left,t.right)Expect.notEqualtreetree2(sprintf"%O %O"treetree2)}]runTests{defaultConfigwith``parallel``=true}tests

Conclusion

The results are very interesting. From a CPU perspective, FParsec wins hands down regardless of the chosen data structure. There is an order of magnitude difference between the three serialization techniques.

From an SLOC perspective, Json.NET is the clear winner. Serialization and deserialization are both 1 line, each. FParsec is the most complex, requiring the development of a custom de-/serialization format, then ensuring that all 3 data structures can be transformed to that format.

Chiron stays right in the middle, both in terms of performance and SLOC.

The one advantage that both Chiron and FParsec can provide is that, when created the right way, the serialized string can be deserialized into any of the 3 data structures (discriminated union, record, or class). The FParsec implementation above works that way.

Unfortunately, I do not know how to measure memory usage from within IfSharp / Jupyter. So, I can't really speak to that.

From a data structure perspective, records seems to provide very good performance. I was surprised that records performed even better than classes, in general. However, it appears that Json.NET had issues with both discriminated unions and classes, during deserialization and serialization, respectively.

Personally, I am not sure that creating a custom de-/serialization format (aka, going down the FParsec route) is worth the effort, especially given all the edge cases that must be handled. For example, my FParsec implementation cannot handle double-quotes within strings and, I believe (though I didn't test this), cannot handle strings with newline characters in them.

Json.NET and Chiron both use JSON, which is an extremely popular format and enjoys widespread support in development tooling. I believe this is a better route, especially because using such a common, human-readable format would make debugging easier. Maintainability is extremely important and something that is often overlooked.

After testing the system with JSON, it can be shifted to a binary format such as MessagePack, BSON, or Protocol Buffers, among others, if size becomes a concern. Most of these formats have well-tested libraries available in multiple languages.