2011-11-11

random forests; supernova discovery

In the astro seminar today, Joey Richards (Berkeley), about whom I have been blogging all week, spoke about the methodologies and successes of the Bloom-led Center for Time-Domain Informatics team in automatically classifying time-variable objects in various imaging surveys. He concentrated on random forest (a combination of many decision trees, each of which is made by randomizing in various ways the training data) in part because it is extremely effective in these kinds of problems. He even claimed that it beat well-tuned support-vector machine implementations. I will have to sanity-check that with Schölkopf in Tübingen! Richards did a great job, in particular in explaining and responding to the principal disagreement we have, which is this: I argue that a generative model that can generate the raw pixels will always beat any black-box classifier, no matter how clever; Richards argues that you will never have a generative model that is accurate for all (or even most) real systems. It is this tension that made me invite him (along with Bloom and Long) to NYU this week.

After the seminar, Or Graur (Tel Aviv, AMNH) showed us how he can find supernovae lurking in SDSS spectra and pitched (very successfully) a test with SDSS-III BOSS data. We will get on that next week.

No comments:

Post a Comment