Interesting question, I've been thinking in these lines too.
I think that the goal of the unsupervised learning is to discover better features in the absolute sense, not only with respect to some artificial set of labels. And then from these intrinsic features one should be able train a possibly simpler model in a supervised fashion with respect to any set of reasonable labels with respect to that dataset. As in kernel methods. First map the data into a higher dimensional space and implicitly train a linear model in this higher dimensional space.
So the distances in the intrinsic feature space should be more semantical, therefore the proximity matrix I expect is better for plugging it in a kernel SVM. In terms of speed there's no gain, but there may be some quality gains. But one thing is theory and intuition, and practice another, and cannot support you with empirical evidence. But discussion is interesting.
I think at 7:40 Geoffrey Hinton is discussing this here

Interesting question, I've been thinking in these lines too.
I think that the goal of the unsupervised learning is to discover better features in the absolute sense, not only with respect to some artificial set of labels. And then from these intrinsic features one should be able train a possibly simpler model in a supervised fashion with respect to any set of reasonable labels with respect to that dataset. As in kernel methods. First map the data into a higher dimensional space and implicitly train a linear model in this higher dimensional space.
So the distances in the intrinsic feature space should be more semantical, therefore the proximity matrix I expect is better for plugging it in a kernel SVM. In terms of speed there's no gain, but there may be some quality gains. But one thing is theory and intuition, and practice another, and cannot support you with empirical evidence. But discussion is interesting.
I think at 7:40 Geoffrey Hinton is discussing this here