noisy photometric redshift posteriors

The close reader (there's a reader, I know, but a close reader?) of this blog will have noticed that CampHogg is in a transition. We used to think that projects producing "catalogs" from telescope data ought to produce likelihood information. Now we think this is probably impossible in general, and we will have to live with (at best) posterior probability information, under some priors. We discussed this in group meeting, in particular related to Malz's project and LSST. The photometric redshift system LSST expects to create using cross-correlations (with, say, quasars) will (if the method works) produce posterior probability distribution function estimates (not values, but estimates, which is scary).

Key questions we identified include the following: Understand what the effective priors are for that procedure. Understand what the deep assumptions are behind the cross-correlations; I think they have to be at large scales to work properly (linear bias and all that). And understand whether it has been demonstrated to work, empirically. I got all confused at the end of group meeting about the fact that the method generates noisy estimates of a pdf, which is a strange meta issue. What do you do about a noisily known posterior pdf?


  1. "What do you do about a noisily known posterior pdf?"

    Marginalise over this uncertainty when making your inferences and predictions, but I suspect you already knew this. It's probably intractable in many problems, but has been solved in some problems. For example, classic Nested Sampling produces a posterior distribution over the value of the unknown FML integral. And Bayesian 'supervised learning' methods do similar things too but are harder.

    Another property that could help depending on your needs: many Monte Carlo methods continue to work when you can produce an unbiased (frequentist sense) estimator of a density, instead of measuring the density precisely. I think this might have a fancy name like pseudo-marginal something or other. Ewan will correct me if I'm wrong. :)

    1. I wouldn't strictly say this was a case for pseudo-marginal since the available posterior samples are presumably only generated once, not repeatedly on command, although the eventual solution will probably involve pseudo-importance sampling (i.e. reweighting of those posterior draws) to the same effect.

      One thought that amuses me: if you have lots of objects, each with its own posterior, and we assume (1) the unknown prior was the same for each and (2) the likelihood dominates, then trying to reconstruct the prior they came from looks a lot like trying to reconstruct a weak lensing signal: finding the 'shear' on each more-or-less Gaussian likelihood introduced by the prior in that region of parameter space.