Suppose I have a confusion matrix $A$ for a set of points: entry $i,j$ is the fraction of time over all $j$ (and similarly over all $i$) that when $i$ is present then $j$ is recognized (This means doubly stochastic. I don't think there are any other mathematical restrictions.).

This setup (from psychometric data about stimuli that are miscoded) leads one to think of a subset that has elements that are mutually more confusable within the subset than between pairs of elements where one is inside and the other outside the subset.

This is vaguely reminiscent of a distance (or covariance) matrix where a subset may have mutually small distances but distances outside the subset are larger.

Except, among other substantive mathematical differences, a distance/covariance matrix is symmetric but a confusion matrix is in general not.

Is there a 'meaningful' (coherent, not totally crazy) mapping from a doubly stochastic matrix to a distance matrix? That somehow preserves a vague notion of clustering?

Why is A doubly stochastic? If that's an additional constraint you're imposing, be clear about that. But as-is, you seem to allow one particular point j to be recognized 100% of the time, in which case column j has all 1s and everything else is 0.
–
Darsh RanjanFeb 3 '10 at 22:33

2 Answers
2

I'm not sure how the specifics of your confusion matrix can help, but as far as I know, there is no general way of mapping dissimilarity functions (or matrices) to metrics (which is, probably, a bit more general than what you're asking). There are, however, empirically quite useful ways of doing so.

For example, you may wish to retain the dissimilarity ordering, so that objects/points ordered by distance from a reference point will retain their ordering under the new dissimilarity/distance. This is possible.

In your weighted directed graphs question, you've had the symmetry question answered (e.g., take the minimum of both directions; sum or average will also work). Non-negativity can easily be fixed by shifting all distances by the same constant. Positive-definiteness (i.e., $d(x,y)=0$ iff $x=y$), can be fixed (if it matters) as long as there is a minimum distance between non-identical objects. (For a finite matrix, you should be able to fix this anyway, by shifting by a positive $\varepsilon$ and setting the diagonal to zero, or the like.)

The main challenge is enforcing triangularity, and this can be done by composing your function (or, in your case, matrix) with a strictly increasing, concave function $f$ for which $f(0)=0$.

It should be obvious that such a function will not change the dissimilarity ordering (given that your measure of dissimilarity is already non-negative). What it will do, however, is magnify the smaller dissimilarities more than the larger ones, moving the measure in the direction of triangularity. It's only a matter of finding a function that is "concave enough."

An example of such a function would be $f(x) = x^\frac{1}{1+w}$. Now it's just a matter of choosing a large enough $w$, and you can find that by bisection, for example. (For more details on this approach, see the paper on the subject by Tomáš Skopal.)

As I said, this doesn't really address the specific properties of your matrix, but deals with the general problem. Maybe there are better solutions in your case; I don't know.

By the way, a few years ago, I had a student working on the problem of making substitution weight matrices for string edit distance metric—also quite similar to what you're asking. He explored several algorithms, and his Master's thesis ("Making substitution matrices metric") is is available online.

A small update: My answer mainly addresses how to transform a general dissimilarity function into a metric. The original question was more related to an even more basic step: Turning a similarity function into a dissimilarity function. One way, as used in the thesis above, is $d(u,v) = s(u,u) + s(v,v) - 2s(u,v)$, for example. Or, assuming that similarity decays exponentially with distance (common assumption in psychology), you'd have the relationship $s(u,v) = e^{-c\cdot d(u,v)}$, for some constant $c$. This can, of course, be combined with the symmetry fixing mentioned earlier.
–
Magnus Lie HetlandFeb 10 '10 at 11:19

Maybe a bit late (but it may help the readers), but there is a field of statistics addressing your question I believe. It is called multidimensional scaling and one of the references in this field is this book.

It does not precisely correspond to your setup nor is a rigorous solution to your problem. Yet, it gives interesting tools to concretely design a distance matrix in many cases.