data mining

I started working this week with Vivi Tsalmantza (MPIA) on data mining in the SDSS spectra. She is starting with dimensionality reduction and classification. The standard tool is PCA, but it ranks the components in terms of their contribution to the data variance. This has two problems, the first is that in many data directions your variance is probably dominated by your errors, not anything of scientific interest, and the second is that astronomers don't necessarily care most about the data variance! But we came up with some ways to apply robust estimation techniques to the dimensionality reduction, and I have an evil plan of eventually performing the dimensionality reduction on the error-deconvolved underlying distribution. But that may not be possible, for all sorts of reasons.

No comments:

Post a Comment