Monday, August 4, 2008

Last time I converted the original Bayesian Hierarchical Clustering
code to work in odds space rather than in probability space. This was
the first part of converting the entire program to work in log-odds
space. Converting to log space is somewhat similar to the previous
conversion, but there are a couple of differences. In converting to
odds space, we mostly had a hairball of simple algebra to untangle.
When we convert to log space we'll have to convert more of the program
and pay attention to the operators.

.
If the difference between left and right is small enough, taking the
inverse log (exponentiating) won't be much of an issue, but if the
difference gets big, this could cause us some problems. For the
moment, we'll just leave it and hope we don't have to revisit it
later.

.
We know that * becomes +, / becomes -, + becomes log-sum, but we
should find the log-space equivalent of ‘if’.

(logequiv if) = if

.
It turns out that if simply stays if, so the transformation above
is legal.

With the change from exact probabilities to floating-point operations
in log space, we get about a factor of 10 improvement in performance.
Now we can cluster hundreds of data points rather than only dozens.
There are two more substantial speedups to apply.

I disagree just a bit. We're using IF to paste together two partial functions: the one for the case that X is true, the other for the case X is false. We'd like to be sure our resulting function is well-defined over the domain of interest, and because our predicate is binary we're ok.

But what if we had a language with `fuzzy' predicates that indicate a blend between truth and falsehood. We'd need to figure out how that blend would map into the log space. In other words, we do need to consider the log-space version of if, but because it is simply a discontinuity at a single point, the equivalent version is trivially the same.