Today Markus Bonse (Darmstadt) showed me (and our group: Eilers, Rix, Schölkopf) his Gaussian-Process latent-variable model for APOGEE spectra. It looks incredible! With only a few latent variable dimensions, it does a great job of explaining the spectra, and its performance (even under validation) improves as the latent dimensionality increases. This is something we have wanted to do to The Cannon for ages: Switch to GP functions and away from polynomials.
The biggest issue with the vanilla GPy GPLVM implementation being used by Bonse is that it treats the data as homoskedastic—all data points are considered equal. When in fact we have lots of knowledge about the noise levels in different pixels, and we have substantial (and known) missing and bad data. So we encouraged him to figure out how to implement heteroskedasticity. We also discussed how to make a subspace of the latent space interpretable by conditioning on known labels for some sources.
Maybe you could try adding GPy.kern.WhiteHeteroscedastic to the kernel of your choice and then passing the final kernel to GPy.models.GPLVM?
ReplyDeletePerfect! Yes!
Delete