If you have a population of objects, each of which has a true property x which is observed noisily, how do you infer the distribution of true x values? The answer is to go hierarchical, as we do in this paper. But it surprises some of my friends when I tell them that if you aren't going to go hierarchical, it is better to histogram the maximum-likelihood values than it is to (absolute abomination) add up the likelihood functions. Why? Because a maximum-likelihood value is noisy; the histogram gives you a noise-convolved distribution, but the likelihood function has really broad support; it gives you a doubly-convolved distribution! Which is all to say: Don't ever add up your likelihood functions!
When you look under the hood, what hierarchical inference is doing is looking for consensus among the likelihood functions; places where there is lots of consensus are places where the true distribution is likely to be large in amplitude. Rix had a very nice idea this weekend about finding consensus among likelihood functions without firing up the full hierarchical equipment. The use case is stream-finding in noisy Halo-star data sets. I wrote text in Rix's document on the subject this morning.