In re-reading yesterday's post, I found it strange to hear myself say that the model was over-fitting stellar variability and then we decided to make the model far more flexible! Today we decided that we don't yet have the technology (well, perhaps not the patience, since we want to detect exoplanets asap) to fully separate stellar variability from spacecraft-induced issues, or at least we would have to do something that pooled much more data to do it—we wouldn't be able to work on one light-curve at a time. So we de-scoped to exoplanet science and decided that we would try to fit out everything except the exoplanet transits. This is not unlike what others are doing, except that we are trying to be extremely principled about not letting information in the data about any exoplanet transits "leak" into our modeling of the variability of the light-curve. We are doing this with a censoring or a train-and-test framework.
Because we decided to eradicate all variability—spacecraft and stellar—we had Wang work on auto-regressive models, in which the past and future of the star is used to predict the present of the star. The first results are promising. We also had Foreman-Mackey put all the other stars into the Gaussian Process predictions we are making. This means we are are doing Gaussian Process regression and prediction with thousands of ambient dimensions (features). That seems insane to me, but Schölkopf insists that it will work—being non-parametric, GPs scale in complexity with the number of data points, not the number or size of the features. I will believe it when I see it. The curse of dimensionality and all that!
In the afternoon, we had discussions with Krik Muandet (MPI-IS) and David Lopez-Paz (MPI-IS) about false-positive classification for exoplanet search using supervised methods and a discussion with Michael Hirsch (MPI-IS) about non-parametric models for the imaging PSF. More on the former tomorrow, I very much hope!