2011-05-30

Astrostatistics and Data Mining, day 1

I gave morning lectures at the summer-school part of this meeting on La Palma, on model specification and model choice. I argued that a model is an approximate specification of the probability of the data. I argued for cross-validation, or, when your priors are informative (which, in astronomy, they very rarely are), the Bayesian evidence. Tomorrow we pair-code.

I was followed by Suzanne Aigrain (Oxford), who talked about time-domain methods, especially ones that permit modeling of stochastic processes. She is leading up to Gaussian processes tomorrow, which is highly relevant to my conversations with exoplanet hunters and with Schiminovich about modeling the intensity variations in eclipsing stars.

In the afternoon the workshop part of the meeting started. One highlight for me was that Anthony Brown (Leiden) spent a good part of his talk about Gaia data on Lang and my proposals for a probabilistic catalog. He is committed to having the raw data and the processing pipelines preserved and curated. His message was simple: If you have the best survey and it is going to stay best for many years (decades in the case of Gaia), then re-analysis of your data will be an important capability for the community. Brown was followed by Berry Holl (Lund), who gave an update on his work to make hypothesis testing possible with the full Gaia-catalog covariance matrix, which is far too large to represent even on disk.

Later in the afternoon, the Heidelberg Gaia group showed extremely good results from support vector machines. As my loyal reader knows, I don't like black-box methods like SVM, but they sure do work well.

No comments:

Post a Comment