Today was the first day of a Likelihood-Free Inference workshop at Flatiron, run by Foreman-Mackey (Flatiron) and others. The day started off with an absolutely beautiful introduction by Kyle Cranmer (NYU) about many methods for likelihood-free inference. He started with conceptual matters, and some beautiful examples from intro physics and also from the Large Hadron Collider (where he has been a leader in doing sophisticated inferences). And then he went on a whirlwind tour of methods and ideas.
But my two big take-aways were the following (and these two things aren't even slightly comprehensive or fair to Cranmer's deep and wide presentation): One is that he gave a great statement of the general problem of LFI, where there are, in addition to the data, parameters, nuisance parameters, and per-datum latent variables. He pointed out that even if you are a frequentist you can (in principle) integrate out the latents, because your model puts a distribution (generally) over the per-datum latents. (That's an important point, which I should emphasize in my data-analysis class.) And of course the idea of LFI is that you can't actually compute this integrated likelihood (probability of the data given parameters and nuisances, integrating out latents) in practice. You can only produce joint samples of the data and the latents. So though you are permitted to integrate out the latents, you aren't capable of integrating them out (because, like in cosmology, say, your model is a baroque and expensive simulation).
The other take-away was an incredible idea, which I hadn't learned before (maybe I should read the literature!), which is that sometimes you can set things up (using discriminators—like classifiers—oddly) such that you can compute or approximate the likelihood ratio between two models, even if you can't compute the likelihood of either one. Cranmer said two interesting things about this: One is that if you have a scalar function of the data (like a classification score from a classifier) that is monotonically related to the likelihood ratio, there are ways to calibrate it into a likelihood ratio. The other is that if you need to compute something (the likelihood ratio in this case) you don't necessarily need to compute it by computing something far far harder to compute (the two individual likelihoods in this case); he attributed this sentiment to Vapnik. You can do a lot of inference just with likelihood ratios; you rarely need true likelihoods, so this idea has legs.
any link to that incredible literature?
ReplyDeleteawww shucks...
ReplyDeletehttps://arxiv.org/abs/1506.02169 (approximating LRs with classifiers)
https://arxiv.org/abs/1805.12244 (variants with "gold mining")
https://arxiv.org/abs/1903.04057 (using approx LRs for HMC)
https://arxiv.org/abs/1702.08896 (using approx LRs for VI)