Hogg's Research: over-fitting

2014-06-27

over-fitting

To test our pixel-level model that is designed to self-calibrate the Kepler data, we had Dun Wang insert signals into a raw Kepler pixel lightcurve and then see if when we self-calibrate, do we fit it out or do we preserve it? That is, does linear fitting reduce the amplitudes of signals we care about or bias our results? The answer is a resounding yes. Even though a Kepler quarter has some 4000 data points, if we fit a pixel lightcurve with a linear model with more than a few dozen predictor pixels from other stars, the linear prediction will bias or over-fit the signal we care about. We spent some time in group meeting trying to understand how this could be: It indicates that linear fitting is crazy powerful. Wang's next job is to look at a train-and-test framework in which we only use time points far from the time points of interest to train the model. Our prediction is that this will protect us from the over-fitting. But I have learned the hard way that when fits get hundreds of degrees of freedom, crazy shit happens.

3 comments:

Iain Murray28 June, 2014 07:23
(Hmmm. It seems blogspot ate my original comment through it's ridiculously complicated javascript and web form. Thanks Google!)

Unconstrained linear predictors are powerful: more general than a linear predictor derived from a particular Gaussian process. But I'm not sure what you're doing. What precisely do you mean by "overfit" in this context?
ReplyDelete
Replies

Add comment