Say you have many noisy inferences of some quantity (planet radius, say, for hundreds of planets), and you want to know the true distribution of that quantity (the planet-radius distribution you would observe with very high signal-to-noise data). How should you estimate the distribution? One option: Histogram your maximum-likelihood estimates. Another: Co-add (sum) your individual-object likelihood functions. Another: Co-add your individual-object posterior pdfs. These are all wrong, of course, but the odd thing is that the latter two—which seem so sensible, since they "make use of" uncertainty information in the inferences—are actually wronger than just histogramming your ML estimates. Why? Because your ML-estimate histogram is something like the truth convolved with your uncertainties, but a co-add of your likelihoods or posteriors is pretty-much the same as that but convolved again. The Right Thing To Do (tm) is hierarchical inference, which is like a deconvolution (by forward modeling, of course). I feel like a skipping record. Fadely, Foreman-Mackey, and I discussed all this over lunch, in the context of recent work on (wait for it) planet radii.
Co-adding the posteriors is the right thing to do under certain assumptions about your prior information and the question that is being asked! I bring this up merely to be pedantic, the hierarchical suggestion is usually a better model of actual prior beliefs.
ReplyDeleteIf X = {x1, x2, ..., xN} is a set of quantities for each of N objects and your prior for them (and their data) is independent, then define the "histogram" (I think some people call it the empirical measure) of your objects by f(x) = 1/N \sum_i \delta [x - x_i]. The posterior expectation of the empirical measure of these N objects is the equally-weighted mixture of the N posteriors.
That's technically true, but usually astronomers want to know the distribution from which the objects are drawn; in this case the radius distribution!
ReplyDelete