ABC is hard; The Cannon with missing labels

I spent some time discussing ABC today with Joe Hennawi (MPIA) and Fred Davies (MPIA), with some help from Dan Foreman-Mackey. The context is the transmission of the IGM (the forest) at very high redshift. We discussed the distance metric to use when you are comparing two distributions, and I suggested the K-S statistic. I suggested this not because I love it, but because there is experience in the literature with it. For ABC to work (I think) all you need is that the distance metric go to zero if and only if the data statistics equal the simulation statistics, and that the metric be convex (which perhaps is implied in the word “distance metric”; I'm not sure about that). That said, the ease with which you can ABC sample depends strongly on the choice (and details within that choice). There is a lot of art to the ABC method. We don't expect the sampling in the Hennawi–Davies problem to be easy.

As part of the above discussion, Foreman-Mackey pointed out that when you do an MCMC sampling, you can be hurt by unimportant nuisance parameters. That is, if you add 100 random numbers to your inference as additional parametersL, each of which has no implications for the likelihood at all, your MCMC still may slow way down, because you still have to accept/reject the prior! Crazy, but true, I think.

In other news, Christina Eilers (MPIA) showed today that she can simultaneously optimize the internal parameters of The Cannon and the labels of training-set objects with missing labels! The context is labeling dwarf stars in the SEGUE data, using labels from Gaia-ESO. This is potentially a big step for data-driven spectral inference, because right now we are restricted (very severely) to training sets with complete labels.

1 comment:

  1. Foreman-Mackey's MCMC point is a good one, and why I'm a fan of basing your proposal mechanism on a move that satisfies detailed-balance with respect to the prior. Then you only accept-reject with the likelihood ratio. Elliptical slice sampling is like that. Another option is to transform so that the prior over the variables being updated is (conditionally) uniform over a hypercube, and bounce around inside that for the proposal.

    (Recent thoughts on ABC also on my website.)