2019-08-26

unsupervised spectral modeling

I'm very proud of the things we have done over the years with our project called The Cannon, in which we learn a generative model of stellar spectra from stellar labels, all data driven, and then use that generative model to label other stellar spectra. This system has been successful, but it is also robust against certain kinds of over-fitting, because it is formulated as a regression from labels to data (and not the other way around). However, The Cannon has some big drawbacks. One is that (in its current form) the function space is hard-coded to be polynomial, which is both too flexible and not flexible enough, depending on context. Another is that the spectral representation is the pixel basis, which is just about the worst possible representation, given spectra of stars filled with known absorption lines at fixed resolution. And another is that the model might need latent freedoms that go beyond the known labels, either because the labels have issues (are noisy) or some are missing or they are incomplete (the full set of labels isn't sufficient to predict the full spectrum).

This summer we have discussed projects to address all three of these issues. Today I worked down one direction of this with Adam Wheeler (Columbia): The idea is to build a purely linear version of The Cannon but where each star is modeled using a generative model built just on its near neighbors. So you get the simplicity and tractability of a linear model but the flexibility of non-parametrics. But we also are thinking about operating in a regime in which we have no labels! Can we measure abundance differences between stars without ever knowing the absolute abundances? I feel like it might be possible if we structure the model correctly. We discussed looking at Eu and Ba lines in APOGEE spectra as a start; outliers in Eu or Ba are potentially very interesting astrophysically.

No comments:

Post a Comment