I spent too much time today trying to understand kernel PCA, inspired by Vakili's use of it to build a probabilistic model of galaxy images. Schölkopf would be disappointed with me! I don't see how it can give useful results. But then on further reflection, I realized that all my problems with kPCA are really just re-statements of my problems with PCA, detailed in my HMF paper: PCA delivers results that are not affine invariant. If you change the metric of your space, or the units of your quantities, or shear or scale things, you get different PCA components. That problem is even more severe and hard to control and incomprehensible as you generalize with the kernel trick.
I also don't understand how you go from the results of kPCA back to reconstructions in the original data space. But that is a separate problem, and just represents my weakness.
I think a lot of the problems are because we don't do enough to distinguish between model and algorithm. PCA as described by Hotelling was a model and an algorithm for fitting it. We conflate these things. Kernel PCA builds on the algorithm, whereas a lot of the time we want to extend the model. Your HFA paper, and a lot of my work is extending the model, but when you extend the model you often have to find a totally new algorithm for fitting it.
ReplyDeleteI'm becoming increasingly convinced that a lot of confusions in data analysis arise from a failure to distinguish between model and algorithm.