There is a very nice algorithm and set of methods called “Robust PCA,” originating in a paper by Candès. This method makes use of ideas from convex optimization to simultaneously learn a low-rank representation of the data plus a sparse representation for the outliers. This kind of situation comes up in astronomy all the time.
Way back, Tsalmantza and I made a replacement for PCA called HMF that deals with heteroskedastic data (variable error bars or variable data weights) and also missing data; we used it to build low-rank models of quasar spectra. This weekend I built a robust version of HMF (Robust HMF maybe?) that uses ideas from iteratively reweighted least squares to mimic the algorithm behind Robust PCA. It works! And it works well. Unfortunately, right now the investigator has to tune the rank of the low-rank part and also the soft outlier cutoff used in the IRLS. I would love to figure out principled ways to choose both. If you want to follow along, development is happening here for now.
No comments:
Post a Comment