Saturday, March 28, 2015

How to deal with objects

In my coding standards, I state that I strive for a predominantly functional programming model, even when using an imperative, procedural language such as C++. All functions as well as main routines should, in the ideal case, take a set of inputs and return a set of outputs while avoiding any kind of side effect. This leaves open the question of how to deal with objects which I use quite often. Object-oriented programming doesn't quite fit the functional paradigm, so here is my attempt to lay out an approach consistent with my earlier standards.

Class data ("fields") can be divided into two, broad categories: those that define the object and workspace variables. As an example of the first type, consider a matrix class. A matrix object would be defined by its dimensions and the values of each individual element. These can be modified by the class methods such that the method can be considered an operator on the object. So for instance the matrix class might have a "transpose" or "invert" method. More basic operators might include subscript assignment, permuting rows or columns, and in particular, reading from and writing to a file.

The other type of class data are workspace variables. Using the matrix example: internally the inverse method might comprise two steps--the first to decompose the matrix and the second to perform some other operation on the decomposition. The decomposition is stored inside of the object. This approach gives a minor performance advantage: if we know in advance the size of the workspace variable, it can be pre-allocated as soon as the the object has been defined. It also simplifies memory management.

My current view on object use and modification is rather extreme: data fields should be filled completely at or near the moment of creation and from that moment on the object should be considered more-or-less immutable. Methods may take other objects as arguments and return newly-created objects, but modifications to an existing object should be avoided in favour of deleting it and creating a new one in its place.

Workspace variables in particular should be avoided--they are, in fact, global variables in disguise. This is especially so if they are being used in a context that doesn't modify the more essential object data and is meant to keep the re-entrant property and make the code thread-safe. Operators that modify the object, especially if they are simple such as subscript assignment, can always be made atomic. By contrast, workspace variables by their very nature frequently span multiple methods. Computers are now so fast and memory management so efficient that the prior justification of improving efficiency is nullified. Enabling the program to run on multiple processors will dwarf any marginal gains that could be had with the use of workspace variables.

I admit that most of my current code isn't anywhere close to conforming to these standards although I rarely use workspace variables and have started to actively remove them from my O-O code.

Addendum: I've just re-read this post and realized that the example I've used for a workspace variable is not a very good one. The matrix decomposition depends on the data that define the object and in fact is simply another representation of it. Assuming there is space, the decomposition can be computed once and then kept until the death of the object.

A better would be the multi-class classification objects in the libAGF project. These build up a multi-class classifier from a set of binary classifiers. You feed in a test point and the object returns the conditional probabilities of each of the classes. These probabilities are calculated from the probabilities returned by each of the binary classifiers. In earlier versions of the object-classes, the probabilities from the binary classifiers were stored first in a workspace variable before being passed to the next stage of the computation.