2011-06-02

Astrostatistics and Data Mining, day 4

In the morning, Lupton (Princeton) spoke about imaging data, with a lot of time spent on the (enormous, enormous) value of being band-limited, and how you measure sources with maximum-likelihood techniques. He gave big shout-outs to Lang and my preliminary (and unpublished) results from The Tractor: our project to build a model of all the astronomical imaging in the SDSS, improving on and giving a more transparent and modifiable probabilstic basis to the output of the SDSS software. One of the remarkable aspects of this project, which Lupton noted, is that a very simple model of galaxies and stars does a damned good job of explaining the vast majority of SDSS image pixels, so modeling is not only the best thing you can do, it is also close to saturating the information content in the data.

One thing I realized during his session is that imaging survey data analysis could be much more accurate and precise if we built (hierarchically) priors (maybe even physically motivated ones) on the point-spread function. All the regularizations in use are very heuristic and issues are clearly visible. Another realization I had is that, for faint sources found ab initio in some data, the optimization of the likelihood with respect to position ensures that any unmarginalized flux estimate will be an over-estimate. I have thought about this, years ago, but never delivered. Airplane project? Or maybe I should just offer beer to the first of my loyal readers who demonstrates the magnitude of the effect as a function of signal-to-noise and shows that marginalization corrects it.

In the afternoon I spoke about hierarchical modeling, as demonstrated in our exoplanet distribution modeling projects and our extreme-deconvolution projects. Bailer-Jones (MPIA) objected to my use of the word uninformative to describe certain kinds of priors. I agree; I was using that because it is the jargon of the day. You are always injecting information with your priors; if you can go hierarchical, you inject correct information.

1 comment:

  1. Well, some priors influence the posterior more than others (by reasonable measures). A delta function prior, for instance, leads to a posterior that doesn't care about the data at all. A prior that's uninformative (by some measure) has minimal influence (by some measure) on the posterior.

    ReplyDelete