jackknife and PCA collide

As my loyal reader knows, I have been bashing PCA and hyping jackknife. As I was responding to some local comments on our paper on faint-source proper-motions, I found myself adding the words principal components to a discussion of how well jackknife works! This is because if you have d parameters, and you want to measure the d×d covariance matrix, you will rarely have enough jackknife trials to fill in every element of the matrix precisely. To do so, you would need N much larger than d, and even then you would only do well if the covariance matrix describes a variance that is close to spherical.

However, the jackknife (except in pathological situations) will return a sampling of the covariance matrix that gets the principal components correct. This is because the principal components will dominate the variance (by definition, in some sense). And for error propagation, all you care about are the dominant directions in the space, as defined by the true covariance matrix; this is a rare case where the concept of PCA is good: it is a rare case where you care most about the directions of largest variance.

This relates to one of my unwritten polemics: uncertainties should be communicated via samplings, not analytic descriptions of multi-dimensional confidence regions.

No comments:

Post a Comment