2012-09-25

PCA and mixtures of Gaussians

Way back in the day, my friend and colleague Sam Roweis worked on making principal components analysis (a method I love to bash but occasionally use) into a probabilistic model. He said (very sensibly in this note):

Finally, the PCA model itself suffers from a critical flaw which is independent of the technique used to compute its parameters: it does not define a proper probability model in the space of inputs. This is because the density is not normalized within the principal subspace. In other words, if we perform PCA on some data and then ask how well new data are fit by the model, the only criterion used is the squared distance of the new data from their projections into the principal subspace. A datapoint far away from the training data but nonetheless near the principal subspace will be assigned a high pseudo-likelihood or low error. Similarly, it is not possible to generate fantasy data from a PCA model.
He proposed fixes for these problems and they generally look like (a) putting some non-trivial Gaussian or similar distribution down in the low dimension PCA subspace, and (b) putting some more trivial Gaussian (like perhaps an isotropic one) down in the high dimension orthogonal (complementary) subspace. This converts PCA into a constrained maximum-likelihood or MAP for a (simple) probabilistic model.

Today, Fergus proposed that we do something strongly related in Fadely's project to fit a probabilistic data-driven model to a huge collection of SDSS imaging patches: We will use a mixture of Gaussians to model the distribution, but reduce the dimensionality of the fit (not the dimensionality of the mixture but the dimensionality of the parameter space) by making each Gaussian be composed of a low-dimensional non-trivial structure times a higher dimensional trivial structure. These kinds of models can capture much of what a completely free mixture of Gaussians can capture but with many fewer parameters and much faster optimization and execution. We also figured out symmetry considerations that massively reduce the diversity of the training data. So life looks very good in this sector.

1 comment:

  1. A tech report on Mixtures of Factor analyzers:
    http://www.learning.eng.cam.ac.uk/zoubin/papers/tr-96-1.pdf

    Zoubin provided code that works with minimal tweaking. But I've found the following code (still Octave/Matlab I'm afraid) to be faster and more stable: http://lear.inrialpes.fr/~verbeek/software.php

    ReplyDelete