error propagation at #astrohackny, are MCMC runs converged?

At #astrohackny today, Adrian Price-Whelan and I led a discussion of error propagation and reporting. I talked about three basic methods of error propagation: Exact, when the model is linear and the noise is Gaussian, linearized, by taking derivatives to make a Fisher matrix approximation, and with MCMC. I emphasized taking a geometric view of the situation. Price-Whelan talked about methods for reporting values and uncertainties at the end of a data analysis project. His main punchline is that there is no way to summarize a whole posterior pdf (or likelihood function) with a number and an error bar, so you should just do something sensible and report what you did precisely. Also, you should give the reader a method for obtaining your posterior samples or likelihood function code.

Late in the day I discussed single transits in Kepler with Dan Foreman-Mackey. He is finding that his MCMC runs to characterize the multiple-planet systems he has found are showing very, very long autocorrelation times (like it is taking many CPU days or weeks to sample). If he is right, this throws doubt (in my mind) on any posterior sampling in the parameter space of (say) a 5-planet model. And there are a few claims in the literature of converged samplings.


  1. Our RUN DMC paper looked at the dependence of autocorrelation time vs number of planets/strength of interactions. Indeed, shit gets harder as you increase both. So it is a double whammy for at least two of those claims in the lit (admittedly, ours). But we had loads of patience (~few months) and GPUs.

  2. Are you referring to transit data or RV data, or both? And which MCMC methods are being used?

    1. Just RV and in the context of differential evo MCMC. Although they're pretty dense, Figures 5, 6, and 7 show the dimensionality and planet-planet interaction results: http://adsabs.harvard.edu/abs/2014ApJS..210...11N