2023-07-21

an insight about machine learning

Gaby Contardo (SISSA) completed her visit to Heidelberg today. Over coffee this morning she delivered a very simple, but very nice insight about machine learning outputs. Apologies that this is very Inside Baseball:

As I like to emphasize, you can't really average (or do any populations inferences with) a collection of labels delivered by a discriminative ML method run on a collection of objects. Think: Finding the mean age of a cluster, where each star in the cluster got an age estimate from a discriminative ML method trained on stars with known ages. This is because the discriminative ML methods output something very akin to posterior quantities, and if you average a bunch of posterior estimates, you are multiplying in a prior times itself many times; eventually the prior dominates the inference (in many cases).

Contardo's point: If what you want is a label for a collection of objects, like that mean age, you should train on collections of objects. That is, make a training set where you have sets of N stars, labeled by mean age. Then this model can be applied to a new collection of stars and deliver a mean age estimate! Haha, brilliant. And correct. And consistent with the rules of inference.

No comments:

Post a Comment