After I described my expectation-maximization approach to statistical counterpart association, Schiminovich insisted I read the paper on probabilistic cross-identification of astronomical sources by Budavari and Szalay. They have come very close to solving one of astronomy's fundamental problems.
All of astronomy and astrophysics is built on the observation and reobservation of sources on the sky. In each new observation, especially when that new observation is taken at a wavelength not previously observed, the sources detected in the observation must be matched or
associated with the sources detected in the previous observations. This source association across observations is an ill-posed statistics problem, because you don't know, a priori, how the sources might move or vary or appear different from observation to observation, and all the observations are noisy to boot.
It seems trivial—most astronomers have never thought much about the step of source association—but in fact there is, to my knowledge, no well-posed form for this association problem; every well-posed problem that people have solved (such as take the closest, or the closest within some error radius, or something like that) is some kind of approximation (usually a very uncontrolled approximation!). But Budavari and Szalay do a nice job of building a well-posed problem that is a controlled approximation to the ill-posed problem with clear assumptions and a somewhat scalable form. They don't solve the fully general problem, in which sources can move and vary radically, and in which the observer doesn't know all sources of noise, but they present a nice Bayesian formulation for cross-identification among surveys of a static sky, with known gaussian error properties. I suspect it is not hard to generalize at least somewhat further.