The GPRV meeting started in Oxford today. The meeting brings together people working on data analysis in extreme precision radial-velocity projects, but united by interests in and uses of Gaussian processes. The first day ended with a very nice tutorial by Foreman-Mackey (Flatiron) on applied-math and computational tools for scalable Gaussian processes. He even live-coded and blew everyone's mind with Python jax.
Many talks (including Barragán (Oxford), Delisle (Geneva), and Tran (UT Austin) to name a few) are using Gaussian processes and their derivatives or two Gaussian processes to model the star's variability, with photometry, radial-velocity measurements, and activity indicators modeled as linear combinations of these latent processes. That's a really interesting theme, and connects somehow to my evil plan (with Bedell, Luger, Zhao, et al) of modeling the whole stellar surface. It is definitely an exciting time.
One issue that came up is how to judge or assess over-fitting. There was no consensus or answer, and most of the GP practitioners are very Bayesian. But Bayesian approaches aren't always sensitive to true statistical violations of the model; I want to see some cross-validation in this house.
In other news, Halverson (JPL) told us about publicly available solar data (and lots of it) from NASA NEID. I might want to play with that when I get home!
I haven't visited HR in a while—glad to learn of the GPRV meeting; thanks for the reports!
ReplyDelete"One issue that came up is how to judge or assess over-fitting. There was no consensus or answer, and most of the GP practitioners are very Bayesian." Not sure why the "practitioners are very Bayesian" remark was added; users of Bayesian methods do (and should) worry about overfitting. In regarding to overfitting of GPs in an EPRV context, check out:
[1711.01318] Improving Exoplanet Detection Power: Multivariate Gaussian Process Models for Stellar Activity
https://arxiv.org/abs/1711.01318
(It's also on the Ann Appl Stat "to appear" page---it's been there for probably a year; I guess there's a backlog! https://imstat.org/journals-and-publications/annals-of-applied-statistics/annals-of-applied-statistics-next-issues/)
We used latent GPs and their derivatives to fit stellar activity signals using spectrum-based indicators found via a PCA-like procedure (but tailored to RV). We explored hundreds of combinations of latent GPs, using a two-stage procedure to identify the models that can fit stellar activity well without compromising (frequentist) planet detection power ("fitting away the planet"). There's a more thoroughly Bayesian way to do it, but this approximate approach (approximate in the sense of comparing best-fit models, ignoring parameter uncertainties) was already a cluster-level computation.
Though not finalized until 2020, this work came out of the 2016/2017 SAMSI ASTRO program (in which two of the authors were postdocs) and was put on arXiv in 2017. It's been cited in some EPRV reports; hopefully *someone* at GPRV was aware of it! 8-)
Haha two of the coauthors on that paper were at the meeting!
DeleteGlad to hear it! 8-)
Delete