Having written some stuff that I wasn't happy with a few months ago, thought about it, forgot about it, remembered it, mentioned it on the blog here and there, and then dusted it off today, I got close to being ready to make an argument about when you should and shouldn't compute the marginalized likelihood, even if you are a committed probabilistic reasoner. The fundamental idea is that you shouldn't do model selection based on marginalized likelihoods; these are very challenging integrals but only approximations (and often bad ones) to the integrals you would really want to do to inform a model choice. I guess one way to put it is: Don't spend a huge amount of computer time computing something that is a worse approximation to what you want than something else that might be much easier to compute! I sent my argument for review to Brewer, my guru in all things Bayes. I want to say things that are useful and uncontroversial, so I need to be careful, because on these matters I tend to be like a bull in a china shop.
Late in the day I talked to a NYU Data Science class about possible research projects they could do with the Kepler data. As I was talking about how we search for exoplanets in the data (and how likely it is that the data contain many undiscovered planets), one of the faculty in charge of the class (Mike O'Neil) asked me how many exoplanets we (meaning CampHogg) have found in the data so far. I had to admit that the answer is zero. That's just embarrassing. I need to ride the team hard tomorrow.
I reckon people tend to over-complicate the marginal likelihood calculation: if you are already exploring the posterior via parallel-tempered MCMC then the biased sampling method (Vardi 1985; / my recursive pathway paper) gives you immediately a good estimate with a CLT uncertainty estimate.
ReplyDeleteFor some reason this information is often thrown away by astronomers in favour of the harmonic or arithmetic mean type estimators. E.g. the BIE [Weinberg et al.] does tempered transitions but doesn't use them for marginal likelihood estimator; likewise the ctsmod code for stochastic time-series modelling by Bailer-Jones.
The other nice thing about biased sampling (also called reverse logistic regresion, or the density of states) is that it allows easy prior-sensitivity analysis via importance sample reweighting (using the Radon-Nikodym derivative for stochastic process models; which has a well-defined limiting form for the Gaussian process).