To test our pixel-level model that is designed to self-calibrate the Kepler data, we had Dun Wang insert signals into a raw Kepler pixel lightcurve and then see if when we self-calibrate, do we fit it out or do we preserve it? That is, does linear fitting reduce the amplitudes of signals we care about or bias our results? The answer is a resounding yes. Even though a Kepler quarter has some 4000 data points, if we fit a pixel lightcurve with a linear model with more than a few dozen predictor pixels from other stars, the linear prediction will bias or over-fit the signal we care about. We spent some time in group meeting trying to understand how this could be: It indicates that linear fitting is crazy powerful. Wang's next job is to look at a train-and-test framework in which we only use time points far from the time points of interest to train the model. Our prediction is that this will protect us from the over-fitting. But I have learned the hard way that when fits get hundreds of degrees of freedom, crazy shit happens.


  1. (Hmmm. It seems blogspot ate my original comment through it's ridiculously complicated javascript and web form. Thanks Google!)

    Unconstrained linear predictors are powerful: more general than a linear predictor derived from a particular Gaussian process. But I'm not sure what you're doing. What precisely do you mean by "overfit" in this context?

    1. Sorry about blogspot; I have made everyone subject to the robot overlords here.

      What I mean is that we hope (and find) that the linear fitting of pixels with pixels will remove all co-variability of the pixels (due to instrument and telescope and satellite), but none of the intrinsic variability (due to stellar physics and exoplanets). When we include lots of predictor pixels, we start to fit out the intrinsic variability. Not unexpected, but interesting to me how few predictors you need to see this effect start happening.

    2. Here's my (possibly wrong) understanding of what you're saying so you can correct if necessary:

      You're interested in pixel k at time t, x_{k,t}. You have a bunch of other pixel values \{x_{k',t'}\} for several other locations k'\neq k and a window of time t' \in {t,t-1,t-2,t-3,...t-T}. You will use these other pixels to predict x_{k,t}. You'll then subtract off the prediction in an attempt to leave only "signal".

      If you train the predictor on data from when the telescope is pointing in a bunch of different places, you'll learn the relationship between the pixels due to instrument stuff. If there were no instrument covariances, you'd eventually learn to just predict the mean pixel value of x_k, and subtract that constant from your data, so that might be worth adding back in.

      If you train the linear predictor only on data at recent time points (fitting the predictor to work well on x_k at t-1, t-2, t-3) then a simple predictor would fit the mean of pixel x_k recently. If x_k isn't changing much, an object there would be removed from the sky(!). Given a bunch of parameters, the weights of the linear predictor could also use variations in other pixels as a clock and represent any intrinsic variations in x_k.

      My knee-jerk reaction to your post was that you're proposing using less relevant data (from long ago) and that ignoring the most relevant data (what's going on now) must be wrong. Now I've thought it through, using not only older, but varied data seems a sensible first step. However, what if the covariances in the telescope change over time, perhaps as temperature varies? Then you'd need to look at the recent history of the telescope to work out what state it's in, and which predictor (perhaps from a continuous family) to use.