In group meeting today, a good discussion arose about training a supervised method to find exoplanet transits. Data-Science Masters student Elizabeth Lamm (NYU) is working with us to use a convolutional net (think: deep learning) to find exoplanet transits in the Kepler data. Our rough plan is to train this net using real Kepler lightcurves into which we have injected artificial planets. This will give "true positive" training examples, but we also need "true negative" examples. Since transits are rare, most of the lightcurves would make good negative training data; even if we used all of the non-injected lightcurves arbitrarily, we would only have a false-negative rate of a tiny fraction of a percent (like a hundredth of a percent).
That said, there were various intuitions (about training) represented in the discussion. One intuition is that even this low rate of false negatives might lead to some kinds of over-fitting. Another is that perhaps we should up-weight in the training data true negatives that are "threshold crossing events" or, in other words, places where simple software systems think there is a transit but close inspection says there isn't. We finished the discussion in disagreement, but realized that Lamm's project is pretty rich!