2012-10-12

Miller and machine learning

In the morning Adam Miller (Berkeley) gave a beautiful talk about the data, features, and decision trees of the Bloom-group variable-star classification based on the ASAS data. I know this project well from working with Richards and Long (hey, people, we have a paper to write!), but it was nice to see the full system described by Miller. The technology and results are impressive. The audience (including me) was most interested in whether the automated methods could discover new kinds of variables. Miller had to admit that they didn't have any new classes of variables—indicating that the sky is very well understood down to 13th magnitude—but he did show some examples of individual stars that are very hard to understand physically. So, on follow-up, they might have some great discoveries.

I have criticisms of the Bloom-group approaches (and they know them); they relate to the creation of irreversible features from the data: The models they learn from the data (in their random forest) are generative in the feature space, but not in the original data space. This limits their usefulness in prediction and subsequent analysis. But their performance is good, so I shouldn't be complaining!

In the afternoon, Fadely and I figured out a degeneracy in factor analysis (and also mixtures of factor analyzers). We discussed but see no serious discussion of it on the web or in the foundational papers. We certainly have useful things to contribute about this method in practice.

No comments:

Post a Comment