After talking to Muandet (MPI-IS) and Lopez-Paz (MPI-IS) about false-positive classification using supervised methods yesterday, Foreman-Mackey sent them some lightcurve information from injected exoplanets and from some random locations, just to give them an idea of what to expect when we start a search for exoplanets in earnest. They classified like the wind. Meanwhile, Schölkopf, Foreman-Mackey, and I discussed the problem of exoplanet search.
The point of our de-trending or modeling of stellar + s/c variability is to make search more effective, and (later) characterization more accurate. Since search is the object, we should optimize with search in mind. We encouraged Foreman-Mackey to start up the search in earnest. One insane thing we realized as we got this started was that the Gaussian-Process-based likelihood function we are using (which uses a Gaussian Process in time or with only one feature, time, as input) can be generalized (in the way described yesterday) to incorporate thousands of other stars as features, with almost no performance hit. That is, we can do the stellar and exoplanet transit modeling simultaneously, even when the stellar model uses information from thousands of other stars! This realization set Foreman-Mackey coding.
With Wang we investigated the possibility that we could do our pixel-level modeling of the imaging data not one Kepler quarter at a time but instead one Kepler month at a time (the data break naturally into months at the DSN data download interruptions). It appears that we can: The full quarter doesn't have that much information in it relevant to short predictions than the month. This potentially speeds up the code by a large factor, because most aspect of our inferences scale up with the size of the data. This is super-true for non-parametric inferences (as with GPs) but also with parametric inferences, both because of internal dot products and also because the bigger the data set the bigger (usually) cross-validation says your feature list should be.