I got troubled this morning by the so many projects problem! In the subdomain of my life that is about modeling spectra of stars, and within that the subdomain that is thinking about APOGEE data, there are these, which I don't know how to prioritize!
- Fit for velocity widths and velocity offsets (redshifts) simultaneously with the star labels, to remove projections of velocity errors and line-spread-function (or microturbulence) variations onto parameters of interest.
- Fit stars as linear combinations of stars at different velocities to find the double-lined spectroscopic binaries. Combine this with Kepler data to get the full properties of eclipsing binaries. We have many examples, and I expect we will find many more! We might put Adrian Price-Whelan onto parts of this this week.
- Build (train) models for all parts of the H-R diagram, especially the subgiant and dwarf parts, where we have never produced good models. These are particularly important in the era of Gaia. We might convince Andy Casey to do some of this this week, and Sven Buder (MPIA) is also doing some of this in GALAH.
- Project residuals onto (theoretically determined) derivatives with respect to element abundances, to get or check element abundances. This might also be used to build an element-abundance measuring system that doesn't require a full training set of abundances that we believe. Yuan-Sen Ting (UCSC) is producing the relevant derivatives right now.
- Marginalize out noisy labels at training time, and marginalize out noisy internal parameters at test time. We have Christina Eilers (MPIA) on that one right now.
- Look at going fully probabilistic, where we get posteriors over all labels and all internal parameters. I owe Jonathan Weare (Chicago) elements for this.
- Include photometry into the training and test data to break the temperature–gravity degeneracies. And maybe also extinction! This is easy to do and ought to have a big impact.
- Include priors on stellar structure and evolution to prevent results from departing from physically reasonable solutions. This is anathema to the stellar spectroscopy world (or most of it), but much desired by the customers of stellar parameter pipelines!
- Add in latent variables to capture variations in stellar spectra not captured by the quadratic-on-labels model. Are the learned latent variables interpretable?