2019-05-29

#PhysML19 and likelihoods, dammit

Today was a one-day workshop Physics in Machine Learning. Yes, you read that right! The idea was in part to get people to draw out how physics and physical applications have been changing or influencing machine-learning methods. And it is the first in a pair of one-day workshops today and tomorrow run by Josh Bloom (Berkeley). There were many great talks and I learned a lot. Here are two solipsistically chosen highlights:

Josh Batson (Chan Zuckerberg Biohub) Gave an absolutely great talk. It gave me an epiphany! He started by pointing out that machine-learning methods often fit the mean behavior of the data but not the noise. That's magic, since we haven't said what part is the noise! He then went on to talk about projects noise2noise and noise2self in which the training labels are noisy: In the first case the labels are other noisy instances of the same data, and in the second the labels are the same data! That's a bit crazy. But he showed (and it's obvious when you think about it) that de-noising can work without any external information about what a non-noisy datum would look like. We do that all the time with median filters and the like! But he gave a beautiful mathematical description of the conditions under which this is possible (they are relatively easy to meet; it involves the noise being conditionally independent of the signal, as in causal-inference contexts). The stuff he talked about is potentially very relevant to The Cannon (it probably explains why we are able to de-noise the training labels) and to EPRV (if we think of the stellar variability as a kind of noise).

Francois Lanusse (Berkeley) and Soledad Villar (NYU) gave talks about using deep generative models to perform inferences. In the Lanusse talk, he discussed the problem that we astronomers call deblending of overlapping galaxy images in cosmology survey data (like LSST). Here the generative model is constructed because (despite 150 years of trying) we don't have good models for what galaxies actually look like, or certainly not any likelihood function in image space! He showed a great generative model and some nice results; it is very promising. I was pleased that he strongly made the point that GANs and VAEs don't use proper likelihood functions when they are trained, and those problems might lead to serious problems when you use them for inference. In particular, GANs are very dangerous because of what is called “mode collapse”—the problem that you can generate only part of the data space and still do well under the GAN objective. That could strongly bias deblending, because it would put vanishing probability in real parts of the space. So he deprecated those methods and recommended methods (like normalizing flows) that have proper likelihood formulations. That's an important, subtle, and deep point.

After Lanusse's talk, I came to see him about the point that if LSST implements any deblender of this type (fit a model, deliver galaxies as posterior results from that model), the LSST Catalog will be unusable for precise measurements! The reason is technical: A catalog must deliver likelihood information not posterior information if the catalog is to be used in down-stream analyses. This is related to a million things that have appeared on this blog (okay not a million) and in particular to the work of Alex Malz (NYU): Projects must output likelihood-based measurements, likelihoods, and likelihood functions to be useful. I can't say this strongly enough. And I am infinitely pleased that ESA Gaia has done the right thing.

1 comment:

  1. LSST will definitely try to deliver likelihoods instead of posteriors to the greatest extent possible, and when we can't, we'll keep those priors as weakly informative as possible. I do think we'll need to occasionally apply *some* priors (including in deblending) essentially as a form of regularization - otherwise there are just too many places where the likelihood is at least practically unbounded.

    In any case, I think the lack of sufficiently general models is potentially a much bigger problem in this domain. If I give you a likelihood for a too-simple model, that's got to be at least as bad as a posterior for one that might actually have enough degrees of freedom to represent reality - except, I suppose, that in the first case it might be easier to characterize the bias I introduced.

    ReplyDelete